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THE CONTINUUM-OF-URNS SCHEME, 
GENERALIZED BETA AND INDIAN BUFFET PROCESSES, 
AND HIERARCHIES THEREOF 

DANIEL M. ROY 


Abstract. We describe the combinatorial stochastic process underlying a se¬ 
quence of conditionally independent Bernoulli processes with a shared beta pro¬ 
cess hazard measure. As shown by Thibaux and Jordan [TJ07], in the special 
case when the underlying beta process has a constant concentration function 
and a finite and nonatomic mean, the combinatorial structure is that of the 
Indian buffet process (IBP) introduced by Griffiths and Ghahramani [GG05]. 
By reinterpreting the beta process introduced by Hjort [Hjo90] as a measur¬ 
able family of Dirichlet processes, we obtain a simple predictive rule for the 
general case, which can be thought of as a continuum of Blackwell-MacQueen 
urn schemes (or equivalently, one-parameter Hoppe urn schemes). The corre¬ 
sponding measurable family of Pitman-Yor processes leads to a continuum of 
two-parameter Hoppe urn schemes, whose ordinary component is the three- 
parameter IBP introduced by Teh and Goriir [TG09], which exhibits power- 
law behavior, as further studied by Broderick, Jordan, and Pitman [BJP12]. 
The idea extends to arbitrary measurable families of exchangeable partition 
probability functions and gives rise to generalizations of the beta process with 
matching buffet processes. Finally, in the same way that hierarchies of Dirichlet 
processes were given Chinese restaurant franchise representations by Teh, Jor¬ 
dan, Beal, and Blei [Teh-h06], one can construct representations of sequences 
of Bernoulli processes directed by hierarchies of beta processes (and their gen¬ 
eralizations) using the stochastic process we uncover. 
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1. Introduction 

Since the introduction of the Indian buffet process (IBP) by Griffiths and Ghahra- 
mani [GG05; GG06] and the characterization of its relationship with beta and 
Bernoulli processes by Thibaux and Jordan [TJ07], there has been a surge of work 
extending the IBP in one direction and further exploiting the theory of completely 
random measures in the other. Despite this attention, a characterization of an 
urn scheme corresponding to a hierarchy of beta processes has remained elusive, in 
part, because of the family of beta distributions is not self-conjugate. By reinter¬ 
preting the beta process as a measurable family of Dirichlet processes, we obtain 
such an urn scheme, which we subsequently generalize by considering arbitrary ran¬ 
dom measures. As the main example, the urn scheme arising from Pitman-Yor 
processes not only gives rise to the stable beta process and three-parameter IBP 
introduced by Teh and Goriir [TG09], but also gives rise to a canonical definition 
for a hierarchical stable beta process. 

In this article, we study exchangeable sequences of random sets, their combina¬ 
torial structure, and their corresponding de Finetti (mixing) measures. Following 
[BPJ13], we will refer to the combinatorial structure of a collection of finite sets as 
a feature allocation. Informally, a feature allocation is the Venn diagram adorned 
with counts for each component. 

It will be convenient to represent random subsets of a space D by random 
measures on D. In particular, a so-called simple point process X of the form 
X = Jf,k<c some random element in Z+ := Z+ U { 00 } and a.s. distinct 

random elements 71 , 72 ,... in D, will be taken to represent the set { 7 ^, : k < (^} 
of its atoms. We will assume that D is locally compact, second countable, and 
Hausdorff (abbreviated IcscH). Let A denote the u-algebra of its Borel sets. The 
corresponding u-algebra on the space of measures on (D, A) is that generated by 
the evaluation maps tta '■ fJ- > fJ-A, for A € A. Alternatively, we may think of 
random subsets as random elements in the space of cr-finite subsets of D, equipped 
with the cr-algebra generated by the maps A >->■ #(A' n A), for A' € A, where 
#A denotes the cardinality of the set A. Recall that a random measure /j is said 
to be completely random or, equivalently, have independent increments, when the 
random variables ia{Aj) are independent for any disjoint collection Ai,...,Ak of 
measurable subsets.^ By a hazard measure, we will mean a ct- finite measure pi on 
(D, A) such that /r{s} < 1 for all s € fl. 

1.1. A discrete model. We begin with a simple model. Fix a finite, purely-atomic 
hazard measure Hq, let be the set of its atoms, let 11 be a random partition of 
N := { 1 , 2 ,...}, and let 11 ®, for s G he independent and identically-distributed 
(i.i.d.) copies of 11. The partition 11® associated with each atom s G £/ is a random 
finite or countably-infinite collection C'f,C'|,... of disjoint subsets of N, called 
blocks. Let Uf,iorsG£/ and n G N, be an i.i.d. collection of uniformly-distributed 
random variables, independent also from the partitions. Then consider the sequence 
of simple point processes A„, for n G N, concentrated on £/ and given by 

Ai„{s} := < Ho{s}), where fc® satisfies n G C^s. (1.1) 

Informally, every block in the partition 11® “decides” independently with probability 
iLo{s} whether or not to “take” the atom s G . Then the set A„ has the atom s if 


^ We have adopted the framework for random measures, point processes, Poisson processes, 
and more generally, random measures with independent increments—also known as completely 
random measures—laid out by Kallenberg [Kal02, Chp. 12]. This background, as well as some 
results on beta and Bernoulli processes, two classes of completely random measures of particular 
interest in the first part of the article, are presented in Section 2. 
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and only if the block containing n in 11® has itself “taken” the atom. As constructed, 
EX„ = Hq and X„{s} = Xm{s} whenever n, m are in the same block of If®. 

We are interested in the law of (A„)„gN under the additional assumption that the 
random partition 11 is exchangeable in the sense that its distribution is invariant 
under every permutation of the underlying set N. More carefully, by a random 
partition of N we mean a {0, l}-valued process 11 := such that the 

random set {{i,j) € : 11^^- = 1} is an equivalence relation on N with probability 


one. We say that a random partition If is exchangeable when, for all permutations 
a of N, 




( 1 . 2 ) 


If n is an exchangeable, one can show that the sequence (A„)„gN is itself exchange¬ 
able, and thus conditionally i.i.d. In particular, there exists a completely-random, 
purely-atomic, hazard measure H, concentrated on £/ and given by 



(1.3) 


such that, conditioned on H, the A„ are completely random with mean H. It 
follows that H is the a.s. unique random measure with this property. (We will say 
that H directs (A„)„gN.) 

The distribution of (A„)„gN and the directing random hazard measure H are 
measurable functions of Hq and the distribution of the random partition 11. As one 
example, if 11 is a random partition induced by a one-parameter Chinese restaurant 
process (CRP) [Pit06], then H is the fixed-atomic component of a beta process 
[Hjo90; Kim99; TJ07] with mean H^. Write Qho for the distribution of (Xji)„gN, 
where we have highlighted only its dependence on Hq. 

1.2. The continuum limit. The main focus of this article is on the characteriza¬ 
tion of a sequence of random measures whose distribution can be obtained by the 
following limit construction: Let Hq, Hq, ... be a sequence of purely-atomic hazard 
measures on (f2. A) converging strongly to a cr-finite, though not necessarily purely 
atomic, measure Hq, and write Qho for tde weak limit of the distributions as 
k oo. We will call a sequence (X„)„gN of random measures with distribution 
Qho & (homogeneous) continuum-of-urns scheme with hazard measure Hq. 

For the remainder of the section, we present our main results characterizing the 
continuum limit. In Section 3, we will give a direct construction of the continuum- 
of-urns scheme in the special case where the random partition is that induced 
by a Chinese restaurant process. In Section 6, we give a direct construction of a 
general nonhomogeneous continuum-of-urns scheme without appealing to a limiting 
argument, where nonhomogeneity refers to the fact that the distribution of the 
random partition 11 is allowed to vary across 17. In Section 9, we show that the 
weak continuum limit, outlined above, agrees with these constructions. 

Theorem 1.1. Let (A„)„gN be a continuum-of-urns scheme with hazard measure 
Hq. Then (A„)„gN is exchangeable, and thus conditionally i.i.d. In particular, 
there exists an a.s. unique, random hazard measure H, given by Eg. (1.3), such 
that, conditioned on H, the Xn, for n € N, are i.i.d. and completely random with 
mean H. 

Let (Aji)„gN and H be given as above. We will say that H directs (A„)„gr^ 
and will call such a random hazard measure a (homogeneous) generalized beta pro¬ 
cess. (Nonhomogeneous generalized beta processes will arise as the random hazard 
measures directing nonhomogeneous continuum-of-urns schemes.) Before we can 
characterize the law of such processes, we must introduce a few notions from the 
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theory of exchangeable sequences. (We will develop these concepts further in Sec¬ 
tion 4.) 

Let n = {Cl, ( 72 ,...} be an exchangeable partition of N, where Ci is the block 
containing 1 and Cfc+i, for k gN, is the block that, when nonempty, contains the 
least integer not in Ci U ■ • ■ U Ck- Let [n] := (1,..., n} and let Njn := # ([u] H Cj), 
for j G N, be the number of elements in block Cj among [n]. Then the limiting 
relative frequency of elements in block Ck, i.e., 

Pk '■= lim a.s., (1.4) 

n—>oo ^ 

exists almost surely for every k €N. Let <ji be the structural distribution, i.e., the 
distribution of the first size-biased pick Pi, let 

A:=E(l-E“iPn) (1.5) 

be the expected limiting frequency of dust, i.e., singleton blocks in If, and let k{q, ■) 
be the distribution of 

OO OO 

Y,Pnl{Un<q) + (l-J2Pr?)q, ( 1 . 6 ) 

n=l n—1 

where (C„)„gN is an independent i.i.d. process of uniformly-distributed random 

variables (cf., the discrete model). We then have the following: 

Theorem 1.2. Let H be the random hazard measure directing a continuum-of-urns 
scheme with hazard measure Hq, and let sz/q and Hq be the atoms and nonatomic 
part of Hq, respectively. Then H is completely random and can be written 

H = AHq-\-'^Ps5s-\- p5s a.s. (1.7) 

(s,p)G77 

where (ps)s^s^ is a process of independent random variables such that ps has dis¬ 
tribution k(iLo{s}, •), and rj is a Poisson process on LI x (0,1], independent from 
(Ps)sejY, with intensity measure 

(ds,dp) 1-4 ifo(ds)p“\i(dp). ( 1 . 8 ) 

Following convention in the study of completely random measures, we will call 
the three components of H appearing in Eq. (1.7) the nonrandom nonatomic, fixed- 
atomic, and ordinary components, respectively. When A < 1 and Hq ^ 0, the 
measure described by Eq. (1.8) is merely cr-finite and not finite. In this case, the 
ordinary component has an infinite number of atoms with probability one. 

The ordinary component H of the directing random hazard measure H can be 
related to the a.s. limiting frequencies (T’n)neN of the underlying random partition. 


Theorem 1.3. Let A € A such that 7 := Hq[A) < 00 . Then 

OO Ct 

i7(. = a.s., (1.9) 

t=i 1=1 

for some independent processes {(ft)te'N, and {stj)tje'N^ such that: 

(1) (Ct)tgN are i.i.d. Poisson random variables with mean 7 ; 

(2) (Ptj)jgN, fort G N, are independent collections of i.i.d. random variables 

such that Ptj = Pt; 

(3) and are i.i.d. random variables with distribution j~^Ho(- flA). 



THE CONTINUUM-OF-URNS SCHEME 


5 


At this point, we can draw out several connections with well-known stochastic 
processes. More details are given in Sections 6 and 8: If 11 is a partition induced 
by a one-parameter CRP with concentration parameter d, then iJ is a beta pro¬ 
cess [Hjo90] with mean Hq. In particular, A = 0, and so there is no nonrandom 
nonatomic component; k{q, ■) = Beta(0 q,9 {1 — g)); and <^i = Beta(1, 9), and so, p 
has intensity 

(ds, dp) Ho{ds) 9p~^{1 — p)^~^ dp. (1-10) 

It can be shown that 

n — 1 

Pn = VnY[il-Vj) a.S; ( 1 . 11 ) 

i=i 

where (I4i)nGN are i.i.d. and Vi ~ Beta(l,d). Combined with Theorem 1.3, we 
arrive at the so-called stick-breaking construction of the beta process given by 
Paisley, Zaas, Woods, Ginsburg, and Carin [Pai-f-lO]. 

If, on the other hand, 11 is a partition induced by a two-parameter CRP [Pit96; 
Eng78], with concentration parameter 9 and discount parameter a, then A = 
0; k(g, •) = PiTi): where (Pn)nGN is a size-biased permutation of the 

two-parameter Poisson Dirichlet and (T„)jjgN is an independent collection of i.i.d. 
Bernoulli(( 7 ) random variables; and = Beta(l — a,9 a), and so, t] has intensity 

The ordinary component is thus a stable beta process, as defined by Teh and 
Goriir [TG09]. The a.s. limiting frequencies (P’n)nGN satisfy Eq. (1.11) but for merely 
independent random variables Vn, for n S N, where Vn Beta(l — a,9 na). 
Therefore, Theorem 1.3 recovers the stick-breaking construction of the stable beta 
process given by Broderick, Jordan, and Pitman [BJP12]. (These authors refer to 
the same process as a three-parameter beta process.) 

Even though Teh and Goriir did not define a fixed-atomic component, in our 
opinion, it would be natural to use the term “stable beta process” in order to refer 
to the class of random measures PI arising from two-parameter CRPs in this way. 
In Section 8.5, we discuss the omission of a fixed-atomic component in [TG09], 
and the fact that our definition for the fixed-atomic component differs from that 
proposed by Broderick, Mackey, Paisley, and Jordan [Bro-kll]. 

In the case of beta processes and stable beta processes, one can give a characteri¬ 
zation of the ordinary component as a countably-infinite sum of completely random 
measures, each finitely supported and accounting for the atoms appearing for the 
first time at each stage Xi,X 2 ,... in a continuum-of-urns scheme. These repre¬ 
sentations have proven useful in applications in part because they are extremely 
simple to generate and yield finite approximation bounds. 

These constructions can be extended to generalized beta processes: For every 
n € N, let p^"^(p) = (1 — p)". We can then write for the Borel measure on 

[0,1] given by 

ip^^ki){B)= f (l-pr^iidp), BeS[o,i], (1.13) 

JB 

We can also give a combinatorial interpretation of this measure: Let Kn be the 
number of blocks in the random partition 11 restricted to [n]. The event {Kn > 
Kn-i} is the event that is a new token, and, on this event, Pk„ is the a.s. 
limiting frequency of appearance of this new token in the remainder of the sequence. 
Lemma 4.4 shows that 

{p(-’^-^ki)iB) = ¥{Pk^ €BAKn> Kn-i} 


(1.14) 
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for every n £ N and B £ S[o,i]- 

The identity, p~^ = ^r p £ (0,1), yields the following construc¬ 

tion of the ordinary component of generalized beta processes: 


Theorem 1.4. Let H be the random hazard measure directing a continuum-of-urns 
scheme, and let rj be the Poisson process underlying the ordinary component as in 
Theorem 1.2. Then 

OO 

'^pSs = '^ pSs a.s. (1.15) 

{s,p)£r} n=l {s,p)£r}n 

for some collection pn, for n £ N, of independent Poisson processes on fl x (0,1] 
with intensities 

{Epn){ds,dp) .ffo(ds)/ 0 ^”“^^ci(dp). (1.16) 

Note that for every A G A such that Hq{A) < oo, we have Pn{A x (0, 1 ]) < oo a.s. 

The following result gives approximation error bounds when the above construc¬ 
tion is truncated at a finite stage: 

Theorem 1.5. Assume 7 := iJo(fl) < 00 , and let IT be the random hazard measure 
directing a continuum-of-urns scheme (X„)„gN. Let 

k-l 

h-.= nhq + Y. Y. 

m — 1 {s,p)Gr}rn 

be the finite truncation of H, i.e., the sum of only the first k — 1 terms of the right 
hand side of Eq. (1.15), and let Xi he the restriction of Xi to the complement of 
the support of H — H. Then the expected total mass of the ordinary component 
of H — H, and equivalently, an upper bound on the probability that Xi ^ X\, is 
7(p('=-i)ci)(0,l]. 

1.3. The underlying combinatorial stochastic process. In applications to la¬ 
tent feature models and the theory of exchangeable feature allocations, the com¬ 
binatorial structure of a continuum-of-urns scheme is of primary interest. Let 
(-^n)rteN be a homogeneous continuum-of-urns scheme. For n £ N and h £ := 

{0,1}" \ {0"}, let s{h) := Hj) denote the number of nonzero entries and let Mh 
be the number of elements s such that (Vj < n) Xj{s} = h{j). For every (Borel) 
automorphism ^ on 12 , we can define the transformed processes X’^ := o 4>~^, 
for n £ N, where each atom s is repositioned to Note that the counts Mh, for 
h £ Tin are invariant to this transformation, and it is in this sense that they capture 
only the combinatorial structure. Let |(Xi,...,X„)] denote {Mh : h £ 'Hn}- In 
Section 7, we prove the following: 


Theorem 1.6. Let Hq be nonatomic and finite, let 7 := iLo(fl) < 00 . Then 
P{|(Xi,...,X„)] = (mh : hG-Un)} 

f{n,s{h))'^'' (1.18) 

heUn 


= Y 


exp 


-7^/(j,i) n 

i=i 


{mh)'. 


where 


f{n,k):= [ p^ ^(1-p)” '^qi{dp), for k <n GN. (1-19) 
“'[ 0 . 1 ] 

Remark 1.7. The following identities relate f{n,k) to combinatorial events in the 
underlying exchangeable partition: Let h G Tin such that s{h) = k, let n„ be the 
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restriction of 11 to [n], and recall the definition of Njn above. Then, by exchange¬ 


ability, 

/(n,fc) =P{h-i(i) en„} (1.20) 

= ¥{Nik = kANin = k}= ¥{NK„_,^,n = k} (1.21) 

= (fc I J) = (fc I j) = k}. (1.22) 

These identities may be simpler to work with than Eq. (1.19). < 


It is worth pausing to highlight the connection with the IBP: If 11 is a par¬ 
tition induced by a one-parameter CRP with concentration parameter 0, then 
<^1 = Beta(l,0), and so, f{n,k) = . The resulting p.m.f. is then 

precisely that of the two-parameter IBP [GGS07], with concentration 6 and mass 
7 . Taking 9 = 1, one recovers the original IBP, proposed by [GG05; GG06]. If, on 
the other hand, 11 is a partition induced by a two-parameter GRP, with concentra¬ 
tion parameter 9 and discount parameter a, one recovers the three-parameter IBP 
proposed by Teh and Goriir [TG09]. 

The organization of the remainder of the article is as follows: In Section 3, we 
define a one-parameter scheme and show that it is an exchangeable sequence of 
Bernoulli processes directed by a beta process. It follows that its combinatorial 
structure is an IBP. But more importantly, the combinatorial structure of a hierar¬ 
chy of one-parameter schemes, which corresponds to a hierarchy of beta processes, 
is seen to be the missing hierarchical version of the IBP. In Section 4, we introduce 
some necessary preliminaries on exchangeable sequences and their directing random 
measures. In Section 6 , we define the continuum-of-urns scheme with respect to a 
measurable family of EPPFs, show that the resulting sequence of simple point pro¬ 
cesses is exchangeable, and indeed corresponds with an exchangeable sequence of 
Bernoulli processes directed by a generalization of the beta process, whose ordinary, 
fixed-atomic and nonrandom nonatomic (due to dust) components we characterize 
in terms of the EPPFs. We end the section by describing the IBP analog. In 
Section 8 , we consider the EPPF corresponding with the two-parameter Ghinese 
restaurant process, producing a two-parameter continuum-of-urns scheme that we 
show corresponds to an exchangeable sequence of Bernoulli processes directed by 
a generalization of the stable-beta process. The combinatorial process is shown to 
be the three-parameter IBP introduced by Teh and Gbriir. Finally, in Section 9, 
we return to the limiting construction alluded to in this introduction, and show 
that a general continuum-of-urns scheme can be obtained as a weak limit of finite 
processes. 


2. Preliminaries 

In this section we very briefly review some definitions and results from the theory 
of completely random measures; define beta and Bernoulli processes; and develop 
a few additional properties of Bernoulli processes. 

2.1. Random measures; point processes; Poisson processes. We fix a basic 
probability space (S, iF, P) which one can assume to be the unit interval, equipped 
with the cr-algebra of Lebesgue-measurable sets, under Lebesgue measure. The fol¬ 
lowing setup is (essentially) taken from Kallenberg [Kal02, Chp. 12]. We reproduce 
it here (indented, with several small modifications) for completeness. 

Gonsider an arbitrary measurable space (11,,A): we say that ^ is a random 
measure on (n,.4) when it is a cr-finite kernel from the basic probability space 
(2, P) into n, i.e., ^ is a map from 2 x .4 to [0,1] such that: 
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( 1 ) ^{x, •) is a measure for all x GE; 

( 2 ) := ^( •, A) is a random variable for all A G A\ and 

(3) for some partition ^ 1 ,^ 2 ,... G ^ of fl, ^Ak < 00 a.s. for all k. 

(Note that we require a single partition to be witness to the cr-finiteness of ^.) 
We may consider ^ to be a random element in the space of cr-finite 

measures on equipped with the u-algebra generated by the projection 

maps tta ■ fJ, for all A G A. We dehne the intensity of ^ to be the 

measure given by (E^)A = E(^A), for A G A. 

We say that ^ is a point process when it is an integer-valued random measure, 
i.e., when is a Z+-valued random variable for every A G A. Alternatively, 
we may think of ^ as a random element in the space of all cr-finite, integer¬ 
valued measures on ft. When fi is Borel (e.g., when fl is IcscH), we may write 
^ foi' some random elements 71 , 72 ,... in and C, in Z+, and we 

note that f is simple iff the 7 ^ with fc < C are distinct. (We will say that a 
measurable space (fl. A) is Borel if there exists a measurable bijection (j) from 
(n. A) onto a Borel subset of M., whose inverse is also measurable.) In general we 
may eliminate the possible multiplicities in 71 , 72 ,... to create a simple point 
process which agrees with the counting measure on the support of By 
construction it is clear that is a measurable function of 

A random measure ^ on a measurable space {fl, A) is said to have independent 
increments if the random variables ^Ai,..., are independent for any disjoint 
sets Ai,..., A„ G A. By a Poisson process on fl with intensity measure p, G 
AI(n,A) we mean a point process ^ on fl with independent increments such 
that is Poisson with mean /lA whenever p,A < 00 . These conditions specify 
the distribution of which is then determined by the intensity measure p,. 


2.2. Completely random measures. The law of a random measure TV on fl is 
uniquely characterized by its characteristic functional 

/ I— f: ft ^ K+, measurable, ( 2 - 1 ) 

where Nf := f f(s)N(ds). The following result characterizes Poisson processes: 

Theorem 2.1 (Campbell). Let N be a Poisson process on {ft, A) with nonatomic 
mean measure p and let f: ft ^ R+ be measurable. Then 


E[e = exp 


y'(l_e-/("))/r(ds) 


( 2 . 2 ) 


By a completely random measure we simply mean a random measure with inde¬ 
pendent increments. Poisson processes are the simplest type of completely random, 
and as we will see, Poisson processes play a fundamental role in the theory of 
completely random measures. (The interested reader is referred to [Kin67] and 
Kallenberg [Kal02, Chp. 12] for further details. Note that our definition of random 
measure ensures certain weak finiteness conditions.) 

Let 5 be a random measure on a IcscH space (n,A). We say that s G 12 is a 
fixed atom when P{^{s} > 0} > 0. We begin with a characterization of completely 
random measures without fixed atoms: 


Theorem 2.2 (Kingman [Kin67], Kallenberg [Kal02, Cor. 12.11]). Let f be a ran¬ 
dom measure on (12, A) such that ^{s} = 0 a.s. for all s. Then f is a completely 
random measure if and only if 

^A = plA-\- / p 77(A X dp) a.s., 


(0,oo) 


A G A, 


(2.3) 
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for some nonrandom nonatomic measure ^ on n and Poisson process p on Q x 
(0, c»). Furthermore, fA < oo a.s. for some A € A if and only if p[A) < oo and 

{p A 1 ) £77(^1 X dp) < 00. ( 2 . 4 ) 

Note that E77{(s,p) : p > 0} = x (0, cxd)) = 0 for all s. 

Remark 2.3. We will sometimes write Eq. (2.3) more compactly as 

C = Ai+ X/ 

(s.p)Gt; 

< 

When a completely random measure ^ has the form of Eq. (2.3), we call p the 
nonrandom nonatomic component, and we will call f — p the ordinary component. 

It follows immediately from Theorem 2.2 that an arbitrary completely random 
measure is of the form ^ where f is as in Theorem 2.2 and y is a purely- 
atomic random measure, independent of the Poisson process p, and supported on a 
nonrandom countable subset . 2 / C O, where the random variables xis}) for s € , 

are independent. In this case, we will call y the fixed-atomic component and the 
fixed atoms. 

Consider the so-called Levy measure on 17 x (0, 00 ) given by 

z/:= E(X;s^(s.{{4)) ■ (2.6) 

It follows that 

^ = + ( 2 . 7 ) 

and i^({s} x (0, 00 )) < 1 for all s € 17, and so the law of a completely random measure 
^ -f y is uniquely characterized by specifying its nonrandom nonatomic component 
p and its Levy measure v, as the latter encodes the position and distribution of the 
fixed atoms, as well as the intensity of the underlying Poisson process. 

In the other direction, given any u-finite measure on 17 x (0, 00 ) such that 
iz({s} X (0,oo)) < I, for every s G 17, there is a completely random measure whose 
Levy measure is v. In particular, let 

.e/ := {s e 17 : i7({s} x (0, 00 )) > 0} (2.8) 

be the countable set of nonnull 17-sections of v, let fj be the restriction of v to 
(17 \ £/) X (0, 00 ), let p be a Poisson process with intensity fj, and let y be a random 
measure independent from p and supported on £/ such that the masses y{s}, for 
s € £/, are independent with distribution i^({s} x ■) -|- (1 — i7({s} x (0, 00 ))) Jo- 
Then 

y-f ^ pSs ( 2 . 9 ) 

{s,p)eri 

is completely random, with Levy measure v. 

If we write Ms{t) := E[e“‘^f'*f] for the moment-generating function of —y{s}, 
for s € s/q, then the characteristic functional of ^ -I- y is the map 

/ 1 -^ exp I -p/-f ^ Msifis)) - /(I - e“^’'f(®))Ep(ds x dp) I . (2.10) 

I seM) J 

We now introduce two families of completely random measures—beta and Bernoulli 
processes—that are the focus of Section 3. The beta process was introduced by 
Hjort [Hjo90] and later connected to Indian buffet process by Thibaux and Jor¬ 
dan [TJ07], who introduced the notion of Bernoulli processes as defined below. 
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For the remainder, let 

Ho:=Ho + Es^^oHo{s}Ss ( 2 . 11 ) 

be a cr-finite measure on il, where Hq is nonatomic; ^ C H 
is countable; and Ho{s} € (0, 1] for all s € ^/q. 


2.3. Bernoulli processes. By a Bernoulli process with hazard measure Hq we 
mean a purely-atomic completely random measure X with Levy measure Hq (8) (5i, 
written 


X ^ BeP^iHo). 


( 2 . 12 ) 


In particular, X has no nonrandom nonatomic component, its fixed atoms are the 
atoms s/q of Hq, each appearing independently in X with probability iLo{s}, and 
the intensity measure of the Poisson process underlying the ordinary component X 
is (ds, dq) !->• ^i(d(j') HQ{ds). The characteristic functional of X is 


/ exp 


J{1 - Hoids)] X ll ^l-HQ{s} + HQ{s}e-f^^^ 


sGsi/o 


(2.13) 


which, in light of Theorem 2.1, highlights the relationship between Bernoulli pro¬ 
cesses and Poisson processes. In particular, the ordinary component of X is simply 
a Poisson process with intensity Hq. 

By the form of the Levy measure, it is straightforward to show that a Bernoulli 
process is a.s. simple. In fact, every a.s. simple completely random measure is a 
Bernoulli process: 


Theorem 2.4. Let X be a random measure on a IcscH space (n,.4). Then X is a 
Bernoulli process if and only if X is a.s. simple and completely random. 


Proof. The forward direction follows in a straightforward way from the definition 
of a Bernoulli process. In the other direction, let X be a.s. simple and completely 
random, and put Hq = EX. Then X = x-l-^ is the sum of a fixed-atomic component 
X and ordinary component f. By the a.s. simplicity of x, we may write x = 
'^g^^gPsds where jz/q ^ is countable, and the Ps, for s € Mj, are independent 
Bernoulli random variables with mean iLo{s}- It follows that the Levy measure of x 
is HQ(-n£/Q)0di and so x is a Bernoulli process. By independence 

of increments, it suffices to show that ^ is also a Bernoulli process. 

We have ^{s} = 0 a.s. for all s, and so ry = = completely 

random point process on fl x (0, oo) satisfying r 7 ({s} x (0, oo)) = 0 a.s. for all s, and 
moreover, q(- x (0, oo)) is a u-finite point process. It follows from Theorem 12.10 
(FMP) that ?7 is a Poisson process. But then f ^ Levy 

measure E77 = Hq{- \ s/q) 0 (5i, and is thus a Bernoulli process. □ 


It follows immediately that the law of a Bernoulli process X is characterized by 
its mean EX = Hq. (If the so-called mass parameter Hq(LI) is finite, then it is also 
the expected cardinality of the Bernoulli process when considered as a random set.) 


2.4. Beta processes. Let 0 : 12 —>■ ]R_|_ be a measurable function. By a beta process 
with concentration function 9 and mean Hq, we mean a completely random measure 
H on (r2,.A), written 

H ~ BPXfyi^o), (2.14) 

when it is a purely-atomic completely random measure with Levy measure Vna + 1^0 
where 


Vna{ds,dp) = 0{s)p ^(1-p)®^®) Mpfi^o(ds) 


(2.15) 
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corresponds with the ordinary component and 

Va= 5s® Beta(0(s) Ho{s}, 9{s) (1 - Ho{s})) (2-16) 

corresponds with the fixed-atomic component. (Implicit is the requirement that 
Vna is CT-finite, which follows, e.g., if 9 is bounded, because Hq is assumed to be 
cr-finite.) When 9 is constant, we will refer to it as the concentration parameter. 

The remainder of the document will provide a great deal of insight into the 
structure of a beta process, but it is worthwhile stating a few of its properties here: 
First of all, as expected, MB = Hq. When Hq is nonzero, i/na is merely cr-finite, 
even when Hq is finite, and so the ordinary component H oi H has infinitely many 
atoms with probability one. The beta process has several direct “stick-breaking” 
constructions. Several of these ([TJ07; Pai+10; PBJ12]) have analogues in Section 1 
and later, and so we describe one due to Teh, Goriir, and Ghahramani [TGG07], 
which is particularly simple to describe. In the case when 7 = Hgin) < 00 and 
9 = 1, 

n . 

n (2.17) 

i=i ^ 

where (^ri)rtGN are i.i.d. and 7 “^i 7 o-distributed, and (Vj )jgN are i.i.d. and Beta( 7 ,1)- 
distributed. 

Finally, the beta and Bernoulli process are conjugate in the following sense: 

Theorem 2.5 (conjugacy; Hjort, Kim, Thibaux-Jordan). Let H be a beta process 
on LI with mean Hq and concentration parameter 9 > 0. Conditioned on H, let 
(Jf„)„gn be independent Bernoulli processes with hazard measure H. Then 

I Xh ^ BeP, {j^Hq + ^ ^ X,). (2.18) 

i—1 

Remark 2.6. This result was first shown by Hjort [Hjo90, Cor. 4.1] for the case 
of censored observations in fl = R. This result can be seen as a corollary of a 
result due to Kim [Kim99, Thm. 3.3], who studied censored observations from gen¬ 
eral completely random hazard measures. Thibaux and Jordan [TJ07] presented 
the result in the form above, and showed that an Indian buffet process was the 
combinatorial structure of a conditionally i.i.d. sequence of Bernoulli processes sat¬ 
isfying Eq. (2.18). Note that Theorem 3.3 of [Kim99] assumes that the nonrandom 
nonatomic part of Hq is absolutely continuous w.r.t. Lebesgue measure. This is 
not necessary; indeed, the proof does not rely on the assumption in any deep way. 
Theorem 6.7 implies the above claim with no such assumption, and so we omit the 
proof here. <1 

The conditional independence and mean structure above will reappear many 
times below, and so we introduce the following terminology: 

Definition 2.7. Let H he a random measure and let (X„)„gN be a sequence 
of Bernoulli processes. We will say that (X„)„gN is an exchangeable sequence of 
Bernoulli processes directed by H (or, when the context is clear, that (X„)„gp^ is 
directed by H) when, conditioned on H, the Xi,X 2 ,... are independent Bernoulli 
processes with hazard measure H. 

For every measurable function f : Lt =>■ IR+ and measure n on (11, .A), define 
the measure fn by (/i^)(H) = f(s)iy(ds), for A G A. The following result will 
be used regularly without comment, and follows easily from an approximation by 
simple functions and monotonic convergence: 
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Proposition 2.8. Let f : LI ^ ]R_|_ be a nonnegative, measurable function and f a 
random measure on with intensity H. Then E/^ = fH. □ 


3. The continuum-of-Blackwell-MacQueen-urns scheme 


Let 9 > 0, let U(0,1) denote the uniform distribution on [0,1], and let {Zn)nGM ■= 
{Zi, Z 2 ,...) be a sequence of random variables in [0,1] such that Zi ^ U(0,1) and 

Zn+i\Z[n] ~ 1) + —^ for n e N, (3.1) 

j<n 


where := (Zi,..., Z„). In other words, {Zn)nGn is a Blackwell-MacQueen 
urn scheme, i.e., a conditionally i.i.d. sequence of random variables directed by a 
Dirichlet process (with mean U(0,1) and concentration parameter 9, in this case). 
The combinatorial structure of (Z„)„gN) i-e., the random partition of N induced by 
the random equivalence relation {(n, m) C N x N : = Zm}, is that of a Chinese 

restaurant process, in which the probability of a new table is proportional to 9. 

The focus of the remainder of the article is the following construction and its 
generalizations: Let Y := (Yji)jigN be a nonrandom sequence of simple measures on 
a IcscH space {Lt, A) concentrated on a locally-finite countable set i?/, and, for every 
s S s/, let (^®)„gN be an independent copy of (Z„)„gN. Consider the sequence 
(X„)„gN of simple point processes on LI, concentrated on £/, where Xi := Yi and, 
for every n G N and s G £/, we dehne 


Xn+i{s} 


Xj{s} if = Z^, where j < n, 
Y„_|_i{s} otherwise. 


(3.2) 


and, for every A G A, define X„+iT := 

In other words, X„_|_i is a simple random measure, whose atoms are some random 
subset of the atoms among Yi,..., Yn,_|_i, and, in particular, conditioned on X[„], 
each such atom s is an atom of Xn,_|_i independently with probability 


9 1 " 

^Y„+i{4 + ^ E (3.3) 

6 n 0 -\- n 

It is straightforward to show that (Xn)nGN is well-defined and that its law, which 
we will denote by Qy, is a measurable function of Y. 


Definition 3.1 (one-parameter scheme). Let Hq be a hazard measure on 
and let X := {Xn)nGn be a sequence of random measures on Then we 

will say that X is a one-parameter scheme with mean hazard measure Hq and 
concentration parameter 6 when there exists an i.i.d. sequence Y := (Yn)neN of 
Bernoulli processes with hazard measure Hq such that P[X|y] — Qy a.s. 


Remark 3.2. We have defined a one-parameter scheme in this way so that the 
relationship to the more general continuum-of-urns scheme defined in Section 6 
is manifest.^ Because of the special properties of the Blackwell-MacQueen urn 


^We could have constructed (Xn)neN directly from a sequence of simple point processes 
(Tn)neNi although doing this rigorously is somewhat cumbersome. This construction, based on 
a randomization, sidesteps several measure-theoretic complications. (See [Kal02, Pg. 226] for an¬ 
other example of randomization.) A nonstandard, though elegant construction would employ an 
i.i.d. collection lor s E B, of urn schemes, but working with a continuum of i.i.d. random 

variables leads quickly to measurability roadblocks unless one, e.g., works on the minimal (and 
unique [HS06]) extension of the basic probability space that made Z and X jointly measurable. 
On this extended space, we would have a one-way Fubini property, which would allow us to show 
that the joint law of (Tn)nGN ^nd (A’n)nGN is precisely as described above. Arguably, a construc¬ 
tion using such an i.i.d. process is more aptly named a continuum-of-urns scheme, but we have 
decided to give a standard construction. 
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scheme, we will see that the law of a one-parameter scheme has a simple conditional 
characterization, which could equally well have served as the definition. < 

Let Gn = o’(Fn+i,-^[n])- From Eq. (3.3), we may conclude that Xn+i is a.s. sim¬ 
ple and has conditionally independent increments given Gn- Therefore, conditioned 
on Gn, by Theorem 2.4, Xn+i is a Bernoulli process with hazard measure given by 

0 1 ^ 

E^Xr,+i = TT—Yn+I + 7 ^ ^ X, a.s., (3.4) 

0 + n u + n 

1=1 

Because Y is an i.i.d. sequence, we may conclude from the definition of Q that 
Vn+i is independent of Xn '■= cr(X[„]) C G, and so, by the chain rule of conditional 
expectation, 

ff 1 " 

E^’^Xn+i = [E®'‘X„+i] = E ^1 a.s. (3.5) 

0 + n 0 + n 

i=i 

Moreover, Xn+i has conditionally independent increments given because 
does. It follows that 

Xn+i\Xn ~ BeP^(E^"X„+i). (3.6) 

From Theorem 2.5, we can now recognize (X„)„gN as having the same conditional 
law as a conditionally i.i.d. sequence of Bernoulli processes directed by a beta 
process. 

Theorem 3.3 (de Finetti measure). Let (X„)„gN he a one-parameter scheme on 
(n,M) with mean hazard measure Hq and concentration parameter 9. Then there 
is an a.s. unique random measure H given by 

n 

H{A) = lim n~^''^^Xi{A) a.s., A G A, (3-7) 

n—>oo . - 

i—l 

Moreover, H is a beta process with mean Hq and concentration parameter 9, and, 
conditioned on H, the random measures Xi, X 2 ,... are independent Bernoulli pro¬ 
cesses with hazard measure H. 

Proof. By Eqs. (3.5) and (3.6), for all n G Z+, the conditional distribution of Xn+i 
given Xfn] agrees with Eq. (2.18) in Theorem 2.5 when we take Hq := EFi. As 
the law of stochastic process is determined by its finite dimensional distributions, 
it follows that (A„)„gN has the same law as an exchangeable sequence (A)j)„gN of 
Bernoulli processes directed by a beta process mean hazard measure H' with mean 
Hq and concentration 9. 

By a transfer argument (Theorem A.l), there exists a random measure H such 
that {H,Xi,X 2 , • ■ •) = {H',X[,X 2 , ...), and so H renders (A„)„gN conditionally 
independent; H = H' and in particular is a beta process with hazard measure Hq 
and concentration 9] and conditioned on H, each A„ is a Bernoulli processes with 
hazard measure H. Einally, letting X =■ <j{H), we have E^Xn = H and therefore, 
we may conclude from the disintegration theorem ([Kal02, Thm. 6.4]) and the law 
of large numbers, that Eq. (3.7) holds. Because 0. is Polish, there is a countable 
TT-system that generates A, and so this convergence holds simultaneously for the 
TT-system and thus H is a.s. unique by [Kal02, Lem. 1.17]. □ 

As one can anticipate from the work of Thibaux and Jordan [TJ07], the one- 
parameter scheme is related to the Indian buffet process (IBP) introduced by Grif¬ 
fiths and Ghahramani [GG05]. In order to make the connection precise, we introduce 
the following quotient of the space of sequences of simple measures: for any pair 
U := (Gi, C/ 2 ,...) and V := (Vi, V 2 ,...) of sequences of simple measures, C/ ~ F 
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when there exists a Borel isomorphism —>■ $7 satisfying Un = Vn o for 

every n. It is easy to verify that ~ is an equivalence relation. Let |17] denote the 
equivalence class containing U. This quotient space is itself a Polish space, and can 
be related to the Polish space of sequences of simple measures by coarsening the 
cr-algebra to that generated by the functionals 

‘Ph{Ui,U 2 , ■■■) ■■= #{s e n : (Vj < n) Uj{a} = h{j)}, h e {0,1}”, n e N. (3.8) 

In other words, |?7] maintains only enough information to count, for every n G N, 
and subset (S' C N, how many points s G 17 are atoms of every and only those 
sets Xj, for j < S, among X^^]- (In the author’s opinion, this is the more natural 
characterization of the equivalence classes induced by the so-called left-ordered form 
proposed by Griffiths and Ghahramani [GG05].) 

The following connection with IBPs follows both from Theorem 3.3 and [TJ07], 
as well as from the general result (Theorem 7.1) for the continuum-of-urns scheme: 


Theorem 3.4. Let 0 > 0, let Hq be a nonatomic hazard measure such that 7 := 
Ho{n) < 00 , and let (X„)„gN be a one-parameter scheme induced by ( 17 i)„gN with 
concentration parameter 9. Then |(X„)„gNl is an IBP with mass parameter 7 and 
concentration parameter 9. 

The IBP was first defined for the case 0 = 1 by Griffiths and Ghahramani 
[GG05; GG06], and later generalized to 6 * > 0 by Ghahramani, Griffiths, and Sollich 
[GGS07]. 


4. Exchangeable sequences and partitions 

In this section, we introduce some concepts relating to exchangeable sequences of 
random variables and partitions. (The following development owes much to [Pit96], 
which also provides more details for the interested reader.) These results will 
be used subsequently to introduce and characterize a generalization of the one- 
parameter scheme. 

Let (Zn)neN be an exchangeable sequence of random variables taking values in a 
IcscH space, and assume that the marginal distribution of the first (and thus every) 
element, v := f‘{Zi S ■}, is nonatomic. We are interested in the combinatorial 
structure of the sequence. In particular, let n„ and 11 be the random partition of 
[n] and N, respectively, induced by the equivalence relation 

i^ j Z, = Zj. (4.1) 

We may then write 11 = {Ci, ( 72 ,...}, where (7i is the class containing 1, and (7„+i 
is the class containing the first element of N \ Ui<n C'l, provided such an element 
exists, and is the empty class otherwise. (Le., we define Ci := 0 when 11 contains 
fewer than i classes.) 

To complete the decomposition of (Z„)„gN, define 

Mi := inf Ci, for i e N, (4.2) 

with the convention that inf 0 = 00 . On the event that < 00 , i.e., Ci is nonempty, 
define Zi := We say that Zi is the i-th token to appear. It is clear that the 

partition 11 and tokens Zi completely determine the sequence (Z„)„gN. 

We proceed to characterize the probabilistic structure of the combinatorial part. 
Let Kn < n be the number of equivalence classes in n„, i.e., the number of unique 
elements among {Zi,..., Z„}, and take Kq := 0. For every j,n € N, let Njn 
ff{i < n : Zi = Zj} denote the multiplicity of the j-th token to appear among the 
first n elements. Then Nn := ( 7 Vi„, 7 V 2 n,...) is a vector of counts for each token, 
and is necessarily a sequence of Kn positive integers terminated by an infinite 


THE CONTINUUM-OF-URNS SCHEME 


15 


sequence of zeros, and so we may identify the range of these random vectors with 
the space N* := UneN^” finite sequences of positive integers in the obvious 
way. For a sequence n = (ni,..., rife, 0, 0,...), let be the sequence where rij is 
incremented by 1. 

For every finite sequence (ni,..., rik) G N*, we may define 

7r(ni,..., nfc) := P {Kn = k, 7Vi„ = ni,..., Nkn = nu) ■ (4.3) 


By exchangeability, it follows that tt is a symmetric function. By construction, 

fc+i 

7r(l) = 1 and 7r(n) = 7r(n^'^) for every n € N*. (4.4) 

f=i 

A symmetric function on N* satisfying Eq. (4.4) is known as an exchangeable par¬ 
tition probability function (EPPF) and can be seen to completely characterize the 
distribution of the combinatorial structure of {Zn)n€n- (See [Pit96] for more de¬ 
tails.) In particular, it can be shown that the conditional distribution of Zn+i given 

^[n] 


V ^{N+^) c 7r(iV+(^"+^)) 

^ 7r(iV„) ^ n{N^) 


(4.5) 


Because of this underlying urn scheme structure, we will refer to any exchangeable 
sequences with nonatomic marginal distributions as a 7r-scheme. When Eq. (4.5) 
holds, we will say that the 7r-scheme has marginals n. 

By de Finetti’s theorem, we know there is an a.s. tail-measurable random prob¬ 
ability measure k such that 

(Z„)„gN I a. (4.6) 


(We say that k is the random measure directing the exchangeable sequence (Z„)„gN.) 
In order to characterize H further, let Pi be the a.s. limiting relative frequency of 
Ci, i.e., define 

N- 

Pi := lim —a.s., f = l,2,... (4.7) 

n—^oo ^ 


Thus Pi is the long run frequency of the first token Zi. It is easy to see that the 
distribution of (Pi)igN and {Kn)n^n depends only on the EPPF tt and not on v. 
With probability one, it holds that 

CO CO 

^ = + (4.8) 

i=l i=l 

Note that (Z„)„gN is an i.i.d.-iz sequence, independent of (P„)„gN- 

Another way of summarizing the combinatorial structure of {Zn)n^n is by the 
arrival times r of tokens, i.e., the random function t : N —>■ N given by 

:= inf {i <j : Zi = Zj}, (4.9) 

i.e., Tj is the first time the token Zj appears among Zi,...,Zj. Write t := 

('^1,7-2,...). 

Theorem 4.1. Let Z be a n-scheme and let t be defined as above. Then there 
exists an i.i.d.-v sequence (Un)nGn, independent from t, such that Zn = Ur„ a.s. 
for every n. 


Proof. By an explicit construction and a transfer argument. 


□ 
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4.1. Projections. We study a Tr-scheme, where each new token is marked with an 

1.1. d. Bernoulli random variable. We will be interested in the distribution of the 
a.s. limiting relative frequency of these marks, as a function of the mean of these 
Bernoulli marks. We will be especially interested in the limiting behavior as the 
mean of these marks converges to zero. 

Let (Z„)„gN be a 7r-scheme with marginals v = U(0,1). For every q S [0,1] and 
n S N, put •= so that, for every q e [0,1], the process is 

an exchangeable sequence in {0,1} directed by the random Bernoulli measure with 
mean 


Qq ■= v[Q,q], 


(4.10) 


i.e., conditioned on Qq, the elements are independent Bernoulli random variables 
with mean Qq. 

The process {Qq)q^[o,i] can be taken to be the distribution function of the di¬ 
recting random measure and so we may choose a version of Q so that q i—>■ Qq 
is monotonically increasing, continuous from the right with left limits, and satisfies 
Qo = 0 and Qi = 1 surely. By [KalOS, Prop. 1.4], for every q G [0,1], we have that 
Qq is the a.s. limiting frequency of the event Zi < q as i ^ oo, i.e., 


Qq = lim 


#{j <n : Zj <q} 


a.s. 


(4.11) 


It is also clear from Eq. (4.8) that 


OO OO 

Qq = Y. [O’ 9] + (l - E P) 5 (4.12) 

i=l i=l 


Note that (Zn)nei^ is an i.i.d. sequence of U(0,1) random variables, independent of 
(P„)„gN- Moreover, the law of Q is fully characterized by tt, and vice versa. 

For fc < n G Z+, g G (0,1], and B G define 


k„p(g, B) := ^{Qq G B \ Sg^n = k} 


/^/(l-p)"-fep{QgGdp} 

I[o,i]P^(.^ - P)'^~''^{Qq ^ ^P} ’ 


(4.13) 


where Sq^n = ^q,j k{q, B) = ko,o(9, B) = F{Qq G B}. Let j < m € Z+. 

By Bayes rule, we have 


km+n,j+k{Q^ L?) 




(4.14) 


Theorem 4.2. Ik;i,i(g, •) —P{Pi G • } weakly as g),0. 

Proof. Note that Zi = Zi a.s. It follows from Eq. (4.12), that Qp has the same 
distribution on {Zi < p} as Pi + Pi ?] + 9; which is even 

independent of Zi. The latter quantity converges a.s. to Pi as g i 0, and so its 
distribution converges to that of Pi. □ 

The preceding result suggests that we can extend from (0, 1] to [0, 1] in such 
a way that k„p(0, ■ ) is defined whenever k„p(g, •) is defined for some (and then 
every) g G (0,1). 

Eor fc < n G N, define 

k„p(P) := P{Pi G B I #([n] H Ci) = k}. 


(4.15) 
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Then ki.i = <^i is the distribution of Pi, the a.s. limiting frequency of the first token 
Z\. Bayes rule implies that 






B e S[o,i]- 


(4.16) 


The next theorem extends k as g 10 and shows that the limiting behavior of k is 
determined by <ji: 


Lemma 4.3. Let k <n € As g^O, 


^n,k{Q, ■ ) 



k = 0 or n = 0, 
k>l, 


weakly. 


(4.17) 


Proof. From Eq. (4.12), it is clear that Q, ^ 0 as g|0 a.s., and so f’lQq G •} = 
11 ^ 0 , 0 ( 9 , ■) converges weakly to ^o- To see that this holds when fc = 0, note that 
k„^o(g, A) —>• ko,o(g,^) weakly as g),0 because P{5'g_„ = 0} —>■ 1 as g4.0. 

Now assume k > 1. The result follows from Theorem 4.2, the identity Eq. (4.14) 
with m = j = 1, and the boundedness and continuity of the map p 1 —> — p)" 

on [0,1]. □ 


Lemma 4.3 implies that we may define 

kn,fc(0, •) = limk„,fc(g, •) (4.18) 

do 

where the limit is understood in terms of weak convergence (not setwise). Recall 
that p*-"^(p) = (1 — p)". We can give the following characterization of in 

terms of (P„)„gN and (Pr„)„gN: 

Lemma 4.4. For every n G N and B G P[o,i]) we have 

{p^^-^ki){B) = F{Pk„ GBAKn> (4.19) 

Proof. We have [Ci fl [n] = 1 A Pi G P] = (1 — Pi)"“^1b(Pi). The expectation 
of the latter is p^'^~^\iB. But then, by exchangeability, P{Ci fl [n] = 1 A Pi G 
B} = F{Ck„ n [n] = 1 a Pic„ G B} and {Ck„ H [n] = 1} = > P„_i}. □ 

Define A„ := F{Kn > Kn-i}. From the proceeding result, we have A„ = 


5. Exchangeable sequences of Bernoulli processes 

For a measure R on D, let C(R) be the simple measure on fl x (0,oo) given by 
R = Ssen'^(s.Rfs})- be shown that the map R C(R) is measurable. Let 

41 [x, p] := 1 — p + p e~^ be the moment generating function of a Bernoulli random 
variable with mean p evaluated at —x, and, given a measurable function / on D, 
write 41/ for the map (s,p) i-A 4>[/(s), p]. 

Theorem 5.1. Let g, /i,. •., /n > 0 &e measurable, and put g'{s,p) = pg{s). Let H 
be a completely random hazard measure with nonrandom nonatomic component p, 
fixed atoms £/, and Levy measure g on Qx (0,1]. Conditioned on H , let Ai, ..., A„ 
be independent Bernoulli processes with mean H . Then 

n n n 

logEexp{-Pg -'^Xjfj} = -p{g + ^(1 - e"^0) “ ^7(1 - e"®' Y[ 4'/^.) (5.1) 

i=i i=i 1=1 

+ ^ log{(<5, ® P[P{s}])(e-«' n;=i 41 /,)}. (5.2) 
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Proof. We have 

E^[ex.p{-Xjfj}] = ex.p{-n{l - e~^^) C{H) log'^f.}. (5.3) 

It follows from the chain rule of conditional expectation and then Theorem 2.2 that 

Eexp{-Hg - X^f^} = E[exp{-iJg} n”=i E^[exp{-Xj/j}]] 

= E[exp{-/i(g + - C{H)ig' - \ogUU (5-4) 

Let := H{- n £/). Then by Campbell’s theorem, 

E[exp{-C(iL - H^){g' - logn"=i 4'/^}] = exp{-77(l - es' «'/,-)}. (5.5) 

On the other hand, by complete randomness, and H — are independent 
and 

E[exp{-C(iL^)(g'-logn"=i ^/j)}] = nse.^lP[-f^{s}](e“®'n"=i4'/,), (5.6) 

completing the proof. □ 

We may write g = v ® k for some measure u on O and kernel k from O to (0,1]. 
In the case that p, is absolutely continuous with respect to v, we can give a unified 
characterization of the ordinary components of Xi ,..., X„. 

Theorem 5.2. Let v ® k he a disintegration of g, assume that /r <C u and let A 
satisfy g, — Av. Then 

- EU - ^(1 - n;=i 4^/.) (5.7) 

= n(u (8> (p^^4)^ _l_j ^ f ■ ^ i') [1 ~ e“7.i(®)]) 

4 4" jc[n] V + / 

(5.8) 

The partial sums Sn := Ai + • ■ • + X„ are key quantities when studying the 
conditional distribution of H. The next result characterizes the joint law of Sn 
and H. Write H '■= H{ - \ £/) and Sn ■= S'„( • \ £/), and let Bin(n,p) denote the 
Binomial distribution on {0,1,..., n} with mean np and variance np{l — p). For 
every r <n € Z+, define pi”’’’) : [0,1] ^ [0,1] by p("’’')(p) ;= p’'(l — p)”“’'. 

Corollary 5.3. Let f,g >0 be measurable, let f"{s,p,k) = f'{s,k) = kf{s) and 
g"{s,p,k) = g'{s,p) = pg{s), and let Bm^'^\p, ■) = Bm{n,p). Then 

logEe“'^®“‘®”7 = —p(p + n (1 — e~^)) — g{l — e“® '^J) (5.9) 

= —p ,((7 + n (1 — e“7)) — (x; (g) (k (g) Bin^"^))(l — e“® “7 f (5.10) 
where v ® k is a disintegration of g. In particular, for h>0 measurable, 

Ee-c<^Sn)h ^ exp{-(np 0 + u ® KBin[”^)(l - e"^)}, (5.11) 

where BinJ;(^^(p, •) is the restriction of the measure Bin(n,p) to the set A. 

Proof. The first equality follows from Theorem 5.1, taking fj = f. To establish the 
second equality, note that 

'^J{s,p) = (1 -p + pe“7(®))" 

= =Bin(n,p)(yfc^e-'=7C)). (5.12) 

fe =0 ^ 7 

The second claim follows from Theorem 2.2 after taking g = 0, and noting that 
C(Sn) is a Poisson process on fl x N. □ 
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For a kernel n from S toT let k be the kernel from S to SxT given by ks = 
For a finite measure t on a space S, let t be the probability measure t/{tS). 


Theorem 5.4. Let g > 0 be measurable, let H and Sn be as above, let v (i) n be a 
disintegration of g, assume that /i <C u, and let A satisfy g = Av. Then a.s. 

logE®" [e-^9] = -yLg - {v ® - e-®') + C(^„) log (5.13) 

where g{s,k,p) '■= g'{s,p) ■=pg{s) and 

■= A(s)p("“bfe-i)(5Q + (5.14) 

For n > k > 1 , note that is So when k = 1 and is otherwise the null 

measure. The following result is the key identity. 

Lemma 5.5. Define 7r(s,p, fc) = {s,k,p). Then p := satisfies 

{Av (Si nSi + u ® KBiuj^j^) (8) p = {Av ® Sq ® nSi + v (S n® Binj"j^) o 7r“^. (5.15) 


Proof. Let h > 0 be measurable, and define 

.// M Hs,k,p) 

h {s,p,k) := — - -— 

g’{s,p, k') + f'{s,p, k) 

where g'{s,p,k) := and 

f’{s,p,k) := A(s)p(-i’'=-i)(0) = A{s)5^{k}. 


(5.16) 

(5.17) 


Let c„ denote counting measure on [n], and let := ("')cn. Noting that Binj”j^(p, •) = 
p(".')(p)(;6 and Binj(^^(p, •) = p(”4) (p) jt is straightforward to verify that 

(Au (8 nSi (8 p)h = {Av 8 5o <8 nSi)f'h' + (u 8 k 8 Binj"j^)/'h' (5.18) 

and 

{v 8 ftBiuj^j^ 8 p)h = {Av 8 <5o 8 n5i)g'h’ + (u 8 k 8 ^\n^^)g'h'. (5.19) 

Summing and using the identity {f + g')h' = ho tt completes the proof. □ 


Proof of Theorem 5.4- Let / > 0 be measurable. By the chain rule of conditional 
expectation, 

[e-'^s]] (5.20) 

and so it suffices to show that Eq. (5.9) in Corollary 5.3 is equal to 

exp{—— (u 8 yO^"'’°^K)(l — e“® )} Eexp{C(5„) log (5.21) 

where f{s,k,p) = kf{s). By Corollary 5.3, the identity p = Av, and the fact that 
p^”^ is a probability kernel, 

logEexp{C(,S„)logp(")e“-^“®} 

= —{nAv 8 i5i + u 8 KBinj”j^)p^”^(l — e“-^“®). 

By Lemma 5.5 and the identity 

K 8 Bin^"^ = At 8 Binj”j^ + 8 Sq, 

we can rewrite the right hand side of Eq. (5.22) as 

—np{l — e~^) — (u 8 (/t 8 Bin^"^))(l — e~^ “® ) 

+ (r;8p^”’°)/t)(l-e-®'). 


(5.22) 

(5.23) 

(5.24) 


where f" = f o tt and g" = g o tt. Substituting back into Eq. (5.21), we find 
agreement with Eq. (5.9), completing the proof. □ 
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In the next section, we introduce a generalization of the one-parameter scheme. 


6. The continuum-of-urns scheme 

In this section, we define a generalization of the one-parameter scheme; show 
that it is an exchangeable sequence of Bernoulli processes; and then characterize 
the directing random hazard measure. As a result we define a family of completely 
random hazard measures generalizing beta processes. These random measures can 
be similarly organized into hierarchies and admit a straightforward urn scheme 
characterizing directed i.i.d. sequences of Bernoulli processes. 


6.1. Construction. Let Y := (Y„)„gN be a nonrandom sequence of simple mea¬ 
sures on (O, A) concentrated on a nonrandom locally-finite countable set , and let 
(7>'s)sGn be a (measurable) family of EPPFs, i.e., one such that the map s M- TTsln) 
is measurable for every n G N*. For every s G £/, let := {Zf, Z|,...) be an in¬ 
dependent TTg-scheme, and let r® be the arrival times of the tokens in Z®. Consider 
the sequence X := (X„)„gN of simple point process, concentrated on £/, where, for 
every n G N and s G 

Xn{s} = Y.as}- 

Equivalently, Xi := Yi and for every n G N and s G 

Xj {s}, if Z^_f_^ = Z), where j < n, 

Y„+i{s}, otherwise. 

Let Qy be the law of X, which is clearly a measurable function of Y 

Definition 6.1 (continuum-of-urns scheme). Let Hq be a hazard measure on (fl. A) 
and let be a sequence of random measures on (11, A). We call (X„)„gN a 

continuum-of-ums scheme induced by the (measurable) EPPF family {'!Ts)seQ 
hazard measure FIq when there exists an i.i.d. sequence (T„)„gN of Bernoulli pro¬ 
cesses with hazard measure Hq such that 

]P[(A„)„gN I (T„)„gN] = Q(y„)„£„ a.s. (6.3) 

We may also say that (A„)„gN is a continuum-of-urns scheme induced by {'Kb)s^q. 
and (T„)„gN. A continuum-of-urns scheme induced by a EPPF family (7rs)sgo is 
said to be homogeneous if, for some EPPF tt and every s G 11, we have = tt, and 
nonhomogeneous otherwise. 



( 6 . 1 ) 

( 6 . 2 ) 


Remark 6.2 (relationship with one-parameter scheme). Let 9 > 0 and consider the 
EPPF TT given by 


7r(ni, ...,nk) 


[1 + 9]n-l 


(6.4) 


where n = Yl\=i and [x]m ■= j— This EPPF is that associated with 

the exchangeable sequence characterized by Eq. (3.1), i.e., a Blackwell-MacQueen 
urn scheme. It is therefore immediate that the above definition of a continuum- 
of-urns scheme specializes to that of the one-parameter scheme with concentration 
parameter 9. Thus, in this special case, (X„)„gN is an exchangeable sequence of 
Bernoulli processes directed by a beta process. We can generalize the one-parameter 
scheme by fixing a (measurable) concentration function 0 : H —>■ ]R-|_ and construct¬ 
ing a continuum-of-urns scheme induced by the measurable family (7rs)sgQ where 
is given by Eq. (6.4) for 9 — 9{s). In this case, the sequence remains exchangeable 
and directed by a nonhomogeneous beta process. ([TJ07] discuss this particular 
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nonhomogeneous case.) The next few results show that a general continuum-of- 
urns scheme is also exchangeable and directed by a completely random hazard 
measure. < 

Remark 6.3 (simulability). Note that in a computer simulation of the processes 
Xi,..., X„, one need only generate Zf,..., Z® for each s € Yj and j < n. < 

For the remainder of the section, let be a continuum-of-urns scheme in¬ 

duced by the measurable family {Trs)sen and an i.i.d. sequence (l^)neN of Bernoulli 
processes with hazard measure Hq. 

6.2. Tetrahedral construction. In order to characterize the law of it 

will be useful to return to its construction and produce a richer process from which 
we can derive For every s G sY^ let M® := (Mf, M|,...) be the first arrival 

times of the unique tokens in Z®, and recall that, on the event that there are only 
j unique tokens, = oo a.s. for every k > j. Let X^ for fc < m < n G N, be 
the tetrahedral array of simple point processes, concentrated on si/, such that, for 
every s G 

Xn,m{s} := l(Mfc = m) l(r® = m) Y™{s}. (6.5) 

It is easy to verify that, for every n G N, 

Xn= E X^,™a.s. (6.6) 

fc<m<n 

Writing Qy for the law of the tetrahedral array of simple point processes, a transfer 
argument implies that there exists a tetrahedral array of simple point processes 
such that 

■.k<m<nG'N\Y]=q^ a.s. (6.7) 

and therefore, for every n G N, 

^ ^'^a.s. (6.8) 

k<m<n 


6.3. The law via characteristic functionals. We begin by characterizing the 
probability kernel Y i-G Qy. For every n G N and family / = {fj^rn)k<m<j<n of 
nonnegative measurable functions on 17, it follows from the independence of the 
sequences Z® that 

logE{exp{-J2k<m<j<n^imf3,m}) = fogX/(s, Yi{s},..., Y„{s}) (6.9) 

where 


Xf{s,yi,...,yn) =E^.(exp{-X;fe<„<j<„ = m) 1 (tj = m) ym ■ 

( 6 . 10 ) 

Introduce on 17 x ({0,1}" \ {0}") the measure 

Un ■= ^<5(s,(yi{s},,„,Y„{s})). (6.11) 

s 

It follows from Eqs. (6.7) and (6.9) that a.s. 

[exp{-J2k<m<j<n'>^j,mfj,m}] = C^„logA/. 


( 6 . 12 ) 
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Proposition 6.4. Let Rn := Fi + • • • + and define 

^fis,r):= Xf(s,yi,...,y„). (6.13) 

Vj2j Vj=r 

Then a.s. 

E^"[exp{- Y.k<m<j<n = exp{C(i?„) log A/}. (6.14) 

Proof. Let g, gi,..., gn > 0 be measurable functions and define g'{s,k) := kg{s). 
Then 

logEexp {—R„g) = —nHo{l — e“®) + C{Hq) logBin^ . 

It is straightforward to verify that a.s. 
logE^" [exp{- J2j<n 

= C{Rn) log(s, r ^ T,y.j:. y,=r (r) exp{ - Y.j<n Vjdj (s) } ) , 
from which we can infer that a.s. 
logE-”" [exp{-Unh}] 

= C(i?„) log(s, r . y^=r (”) exp{-h(s, 2 / 1 ,..., y„)}). 

The proof then follows from Eqs. (6.12) and (6.17). 

The following result characterizes the law of the tetrahedral array, and highlights 
the distinct roles played by the atoms and the nonatomic part Hq of the hazard 
measure Hq. 

Proposition 6.5. Let n £ N, let Bin(„ p) denote the distribution of the sum of n 
independent Bernoulli random variables, eaeh with mean p £ (0,1], and let f = 
{fnm)k<m.<n^n be a family of nonnegative measurable functions on ft. Then 

~ (n) 

logEexp{- J2k<m<j<n = -nHo{l - A/( •, 1)) +C{Ho) log Bin A/ 

(6.19) 

Proof. Follows from Proposition 6.4 and Eq. (6.15). □ 

We now characterize the law of (Ai„)„gN. 

Proposition 6.6. Let f = (/j)jg[„] be a family of nonnegative measurable functions 


on fl and let 

ms,r)= Y (p) ®-.(exp{-Ei<„?/r,/i(s)}). (6.20) 

vTZj yj=r 

Then 

_ — '^ (Tl') 

logEexp{-Ej<„E/ 5 } =-'^-f^o(l - A}(-,1))+C(7Jo)logBin A}. (6.21) 

Proof. Follows from Proposition 6.5 for f^ ^^ = fn- D 

While Proposition 6.6 characterizes the law of X, it is relatively opaque. Consider 
the hxed component: We have 

Bin(")A}(s, •) = {q ^E^^{exp{-J2j<nUUr^ < 9)/i(s)}) ( 6 - 22 ) 

= (<? ^ Ett, (exp{- Ej<n Zjfjis)}) (6.23) 

= •), (6.24) 

j<n 


(6.15) 

(6.16) 

(6.17) 

(6.18) 
□ 
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where ?7i, C/ 2 , • ■ • are independent uniformly distributed random variables indepen¬ 
dent of Ti, T 2 , ... satisfying = Ur„ a.s. 

Now consider the ordinary component: Let U he a uniformly distributed random 
variable, independent from (r„)„gN- We have 


n(l-A}(s,l)) = E^.(l-exp{-X;j<„2/r,/i(s)}) (6.25) 

v-I2j yj=i 

= ( 1 -exp{-X;jx„l('r, = \nU]) fj{s)}'^ (6.26) 

= - ^(1-exp{-Ej<„l(^j ='rrnc/l)/j(s)})^ ■ 

(6.27) 


By exchangeability, for all m < n, 

Nk, 


rnt/l ’■ 


= /Vi, 


(6.28) 


and, conditioned on = k, the vector (1(ti = T^nu] ),■•■, Htu = r^nU] )) 

is uniformly distributed among those vectors with k ones and n — k zeros. Finally, 
note that, conditioned on Pi, it is the case that — 1 is binomially distributed 
with mean (n — l)Pi and variance (n — l)Pi(l — Pi). It follows that 


i(l - A}(s,l)) = (fc i Y (l-exp{-Ej<„yi/ 3 (s)})) (6.29) 


= P,, [Pi]Bin("-i)(fc 


vT,j V3=k 

1 




k + 1 


Y (l-exp{-Ej<n2/j/i( 


v-I2i yj=k+i 


(6.30) 


Let A(s) = P^^{Pi = 0} for s £ ft. 


Theorem 6.7. Let Tn = cr(A'i,..., Al„) and define the partial sums Sn '■= Xi + 
• • • + Xn, for n S N. There is an a.s. unique, completely random hazard measure 
H such that 


H{A) = lim ^Sn{A) a.s., for every A G A, (6.31) 

n—foo 

and 


(A„)„gN |P- ^ BeP^(LC). (6.32) 

In particular, (A„)„gN is an exchangeable sequence of Bernoulli processes. For ev¬ 
ery s £ LI and A £ A, we have P{LC{s} £ A} = k(Po{s}, A). Moreover, conditional 
on J-n, the law of H = H{- \ jA)) is given by a.s. 

logE-^" [e-^o] = -AHog - (Ho ® - e-^') + C(,5„) log 

(6.33) 

where g{s,k,p) := g'(s,p) :=pg{s) and 

■=(6.34) 

Proof. Follows from comparison of Eqs. (6.24) and (6.30) with Theorems 5.1 and 5.2 
after taking p := AHq, v := Hq, and K(s,dp) := p“^<;i(s,dp). □ 


Remark 6.8. Theorems 1.1 and 1.2 follow immediately as corollaries. < 

Definition 6.9. We will say H is the random hazard measure directing (Ar„)„gN. 
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Remark 6.10. Recall the family of EPPFs defined in Remark 6.2, corresponding to 
a continuum of Blackwell-MacQueen urn schemes. In this case we know that the 
correspondence with the one-parameter scheme implies that the directing random 
hazard measure is a beta process. < 

Remark 6.11. Conditioned on the directing random hazard measure H is com¬ 
pletely random with nonrandom nonatomic component A.Hq, fixed atoms U 
supp^n, and ordinary component with Levy measure 

p“^(l-p)"ci(s,dp)#o(ds) (6.35) 

For s £ £/, the distribution of iL{s} given is 

p(".s„{4)k(iLo{s}, •)■ (6-36) 

Informally speaking, for s £ supp Sn\-s^o, the distribution of iLo{s} given is 

p(")(s,5„{s}). (6.37) 

< 

6.4. Alternative characterizations of H. For k < m < n £ N, define the partial 
sums 


m n 


II 

M 

3 

vk . _ \ '' y-k 

^ ^n,m'5 

(6.38) 

k' = l 

m'—k 


TL 

— ^j,m^ 

j^m 

TL 

j=k 

(6.39) 


We will give two complementary characterizations of the directing random hazard 
measure H on fl\£/ using the identities 

n n 

Su=J2 ( 6 . 40 ) 

m—1 k—1 

We will recover two classes of representations that have been described in the spe¬ 
cial case where tt corresponds to a one- and two-parameter Chinese restaurant 
process. The hrst class would be well described as size-biased and corresponds 
with the first equality in Eq. (6.40). Such a representation was given by Thibaux 
and Jordan [TJ07] in the one-parameter case and by Teh and Goriir [TG09] in the 
two-parameter case. 

The second equality in Eq. (6.40) corresponds with the second class of represen¬ 
tations. These are the so-called stick-breaking representations, although this name 
would have perhaps been best reserved for the Ferguson-Klass-type construction 
given by Teh, Goriir and Ghahramani [TGG07] of the c = 1 instance of the one- 
parameter case. For the second class, we recover the stick-breaking construction 
of Paisley, Zaas, Woods, Ginsburg, and Carin [Pai-flO], which was later revisited 
by Paisley, Blei, and Jordan [PBJ12] using the calculus of completely random mea¬ 
sures. When TT corresponds with a two-parameter Chinese restaurant process, we 
recover the stick-breaking construction described by Broderick, Jordan, and Pit¬ 
man [BJP12]. In a sense, both classes of alternative representations follow in trivial 
fashions from identities that we note here for the first time in the stick-breaking 
class. However, the development below describes much more of the probabilistic 
structure. 

As our focus is on the nonatomic part of H, define := Xn,m{' \ ■^) 

similarly define A„, Sn,m, S^, and Sn- We begin by showing that the 

columns of Xn,m and X!^ are independent: 
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Theorem 6.12. The columns 

{^n,m ■ k <m and n>rn), m € N, 
are independent. The same holds for the columns 

: m> k and n > m), /c G N. 


(6.41) 


(6.42) 


Proof. Let / = (/^ m)fc<m<neN be a family of nonnegative measurable functions on 
n and, for every i < n S N, define 


fi^tm = 


fn,m, m = i 

0, otherwise, 


/Wn.m = 


fn,m, k = i 

0, otherwise. 


(6.43) 


For every i < n e N, define G {0,1}" by ei,„(j) = l{i = j). It is then 
straightforward to verify that 


and thus 


^/(i)(s>ej.„) = 1, for i ^ j, 

1 - A/(s, 1) = X! “ ^fifn){s, 1)). 


By Proposition 6.5 and Eq. (6.46), 

1 


- A logEexp}- Ek<m<j<n 

= ^ #0(1-A/(„)(•,!)), 

m<.n 


(6.44) 

(6.45) 

(6.46) 


(6.47) 

(6.48) 

(6.49) 


which establishes the first claim. To establish the second claim, it is straightforward 
to verify that, for m < n G N, 


^ ^ ^ [l 71)1 


k<r 


and therefore 


1 Aj(s,l)— ^ ^ [1 A/(s,em,n)] 

” m<n 

m<nk<m 
1 "" 

~ ^ ^ ~ ^rn,n)] 

k<n m—k 

k<n m<n 

= XI (1 -^/[fc](s-i)) • 


k<.n 


The result follows as above from Proposition 6.5 and Eq. (6.55). 

We next establish the law of the partial averages ^Sn,m and ^S^- 


(6.50) 

(6.51) 

(6.52) 

(6.53) 

(6.54) 

(6.55) 

□ 
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Theorem 6.13. Let m,k < n G N. Then 

logEexp{-i5„,m/} = - J G dp A Km > 


Km—l 


Ho{ds) 


and 


logEexpl-^S”;!;/} = - 


(1 -e 


-p/(s) 


Nk, 


n 


G dp 


(6.56) 


Ko(ds) (6.57) 


Proof. The result follows in a straightforward fashion from Proposition 6.5. □ 

We can now determine the distributional limits of the partial averages: 

Note that 

p('"-^’°)(p)<7i(s,dp) =P^^{Pr-^ GdpAKm >Km-i). (6.58) 

Theorem 6.14. As n -G oo, the partial averages ^Sn,m converge in distribution 
to a random measure Hm given by 

logEexp{-#m/} = - / f {1 - G dp A Km > Km-i} Ho{ds). 

(6.59) 

= -{Ho ® p(™-i’°)<ji)(l - e-f), (6.60) 

where f'{s,p) =pf{s). 

Proof. By the continuity of exp and [Kal02, Thm. 16.16], it suffices to prove that 


lim logEexpj- Sn,mf} = logEexp{-Hmf}, 


(6.61) 


for every nonnegative measurable /. Define 


gr^ir) := GdpAKm > Km-i^ for r e M+. (6.62) 


On {Km > Km-i}, we have 


Nk„ 


PKm and thus in distribution, and so. 


by the boundedness and continuity of p i-)- (1 — e ^”’) for r € IR+, we have 

lim p„(r) = 5 (r) := /"(I - e"^’') P^^ € dp A > K^-i}. (6.63) 

n—^co J 

Let / be a nonnegative measurable function, and let Di,D 2 ,... be a partition of 
D such that Ho{Lli) < oo for every z G N. Then, Pn o / < 1, and so, by dominated 
convergence, 


lim 

n—yao JQ, 


{9n o /) diLo = / {go f) dHo (6.64) 

for every z G N, and so then also on all of D, completing the proof. □ 

Define <jfc(s,dp) = P^^jP^ G dp} for fc G N. 

Theorem 6.15. As n -A oo, the partial averages converge in distribution to 
a random measure given by 

logEexp{-P''/} = {Hq o ?fc)(l - e~^ ), 


J y'(l-e-P/(*))P^,{PfeGdp} 


Ho{ds). 


where f'{s,p) = pf{s). 

Proof. The proof is similar to that of the previous theorem. 


(6.65) 

( 6 . 66 ) 

□ 
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We now will establish that the partial averages converge almost surely, and 
relate the limiting partial averages to the directing random measure H. For every 
m < n G N, let Sn,m be the random measure given by 

:=C(5„,™)(Glx (0,1]), AgA (6.67) 

It is straightforward to verify that a.s., for A € A, 

Sn,m(A) = #(AnSUppSn,m)- (6.68) 

In the same manner, define for every k < n € N. By construction we have that 
Sn+i,m > Sn,m for all u G N and so the limit 

Hrn (6.69) 

n 

exists almost surely and is itself a random measure. The same holds of := 

Theorem 6.16. Let k,m gN. The limiting supporting measures Hm and are 

Poisson processes with intensities (^i/9^'"“^’°^)i7o o,nd gkHo, respectively. 

Proof. Follows from the proofs of Eqs. (6.59) and (6.65). □ 

Remark 6.17. Note that, in contrast, the support of Hm and , i.e., C{Hm){ ■ x 
(0,1]) andC(77*)( • x(0,1]), are Poisson processes with intensities (<ri[l(o,i]P^’"~^’°^])77^o 
and (cfel(o,i]).ffoj respectively. In particular, the support of the limiting partial av¬ 
erages no longer contains points that were associated with dust as they appear only 
once in the urn scheme and so their frequency converges to zero. <i 

Corollary 6.18. The limiting supporting measures Hm, for m G N, are mutually 
singular. The same is true of H^, for fc G N. 

Proof. Follows from independence of the measures (Theorem 6.12) and the fact 
that they are Poisson processes (Theorem 6.13). □ 

Theorem 6.19. Let m, fc G N. With probability one, for all A G A, we have 
lim — Sn,m{A) = H{A Hsupp Hm) and lim — ,S^(A) = .ff(Gl fl suppiJ^). 

n—foo ^ n—foo ^ 

(6.70) 

Proof. We prove the former case. The latter follows identically. Let B be the 
support of Hm, and let A G A. Then, almost surely, 

H{AnB)= lim -Sn{AnB) (6.71) 

n—>oo ^ 

1 ^ . 

= lim - ^ Sn,m'{An B) (6.72) 

n^oo ^ , - 

m' — l 

= lim -Sn,m{Ar\B) (6.73) 

n—¥oo ^ 

= lim -Sn,miA), (6.74) 

n—¥oo ^ 

where Eq. (6.73) follows from Corollary 6.18, and Eq. (6.74) follows from the fact 
that the fact that Sn,m t Hm- A Borel probability measure on ft is characterized 
by its values on a countable collection F G Aoi measurable sets, and so the above 
holds a.s. simultaneously for any such collection F, and so it holds simultaneously 
for A. □ 
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Post facto, we may now take Hm and to be not only the distributional limits 
of the partial averages but also their almost sure limits, i.e., 

Hrn = lim —Sn,m and = lim —S^ (6.75) 

n—>oo n—yoo ^ 

almost surely, where the limits are understood in the strong sense. 

Remark 6.20. The above development for Hm can be given a more direct proof. 
Let m € N. It is straightforward to show that, conditioned on Xm,m, the Bernoulli 
processes Xm+i,m, Xm+ 2 ,m, ■ ■ ■ are conditionally i.i.d. From this fact, the existence 
of the limiting partial average follows from the conditional version of the law of 
large numbers. There is no obvious analogue of this approach for hence the 
alternative development above. < 

We now relate J2m=i and H. 

Theorem 6.21. With probability one, 

OO OO 

H = AHo +J2Hm = AHo + H'"- (6.76) 

m=l fc=l 

Proof. Recall that H is the sum of a nonrandom nonatomic component AHq and a 
completely random measure with Levy measure Hq ® But, the infinite 

sum X]m=i completely random with Levy measure given by the infinite 

sum of the component Levy measures, yielding 

where the last equality follows from the identity 

OO 

p-^='£{l-py- (6.78) 

i^O 

Similarly, completely random with Levy measure 

J2kLi {^0 ® ^k) = Ha’S! {jyyLi = Ho® (6.79) 

where the last equality follows from the identity 

E/(Pi) = E(X)^i T’fc/(ac)), for measurable / satisfying/(O) = 0. (6.80) 

In particular, Eq. (6.80) implies that <?i(dp) = P (X]fc(Oil]- D 

Remark 6.22. Theorems 1.3 and 1.4 follow immediately as a corollaries. As noted 
in the Introduction following Theorem 1.3, this result allows one to develop stick¬ 
breaking representations like those given by Paisley et al. [Pai-|-10] for the beta 
process. < 

Remark 6.23. Note that, on their own, the identities given in Eqs. (6.78) and (6.80) 
already yield, via a transfer argument and the calculus of completely random mea¬ 
sures, the decomposition given by Eq. (6.76). However, the relationship between 
Hm and H^, the limiting partial averages, and the underlying urn schemes is not re¬ 
vealed by this approach. In particular, the random measures H^ are not measurable 
with respect to the process (A„)„gN. <i 

Remark 6.24. In unpublished, independent work by James, Orbanz, and Teh [JOT14], 
an identity related to Eq. (6.80) for the case A = 0 is shown to give rise to the 
decomposition of H in terms of H^ as in Eq. (6.76) via a transfer argument (see 
Remark 6.23). < 
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Remark 6.25. To understand where the nonrandom nonatomic component AHq 
arises, let M G N and A G A. Then, with probability one, 

H{A) = lim -Sn{A) (6.81) 

n—^oo ^ 

n 

= lim 'Yl -Sn,m{A) (6.82) 

n—>-oo - 

m—1 

M 

= 'Y Hm{A) + lim 'Y -Sn,m{A). (6.83) 

- n—)-oo T, T ^ 

m—1 m—M 

Taking M oo leaves the residual limj^j^^ YZi=m iSn,m{A). < 

We conclude by characterizing the approximation introduced by truncating the 
first infinite series in Theorem 6.21. Recall that Am = 

Theorem 6.26. Assume that sAq = $, let m G N, and let 

m — 1 

H := AHo + Y (6-84) 

m' —1 

be the finite truncation of H, i.e., the sum of only the first m—1 terms of the right 
hand side of Eq. (6.76). Conditioned on H, let X be a Bernoulli process with hazard 
measure H and let X be the restriction of X to the complement of the support of 
H — H. Then the expected total mass of the ordinary component of H — H, and 
equivalently, an upper bound on the probability that X ^ X, is 


'( 0 , 1 ] 


{i-pr-\,is,dp) 


Ho{ds). 


(6.85) 


When A = 0, Eq. (6.85) simplifies to H^Am- 

Proof. By Markov’s inequality and then the chain rule of conditional expectation. 


P{(A - X)(fl) > 1} < E(X - X){n) = E{H - H)in). (6.86) 

From Theorem 6.21 and Eq. (6.65), 

OO 

E{H - H){n) = Y (6.87) 

m—k 

OO 

= YiHo^ X (0,1])) (6.88) 

m—k 

= (Ho ® (E“=fe p(™-'4))ci)(F! X (0,1])) (6.89) 

= {Ho ® X (0,1])). (6.90) 

The final claim follows from the definition of Am and the fact that Cr(s, {0}) = 0 if 
and only if A(s) = 0. □ 


Remark 6.27 (posterior approximation). Theorem 1.5 follows immediately as a 
corollary. From Theorem 6.12, we can see that i7„+i, Hn+ 2 , ■ ■ ■ are independent of 
An = o'(-^[n])) for every n G N. It follows that these approximation bounds also 
hold conditionally on Xn for n < m. < 
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7. Combinatorial structure and Indian buffet proccesses 


Here we study the combinatorial structure of a continuum-of-urns scheme 
induced by a homogeneous family of EPPFs 71^=71 and an i.i.d. sequence (Prt)rtGN 
of Poisson processes whose nonatomic mean measure has total mass 7 € (0, 00 ). 
This will lead us to generalizations of the Indian buffet process [GG05]. Certain 
special cases recover processes proposed in the literature [GGS07; TG09; BJP12]. 

To begin, note that [^ 1 ] is entirely characterized by Mi, the cardinality of Xi. 
Because Xi = Yi a.s., the cardinality is Poisson-distributed with mean 7 . 

In order to derive the distribution of note that [^[„]] is [-^[ri+i]l- 

measurable due to the fact that 


Mh = Mho + Mhi a,.s., toTheUn- 


(7.1) 


Moreover, by the complete randomness of Xn+i given X[„], it follows that the 
random variables Mhi, for h € {0,1}", are conditionally independent given [X„J. 
(Recall that Moni counts the number of atoms appearing for the first time in Xn+i, 
and Mhi, for h G 'Hn, counts the number of atoms appearing in Xn+i with history 
h.) Indeed, from Theorem 6.7, we know that, conditioned on [X„], 

(1) Mq"! is Poisson-distributed with mean 7 and 

( 2 ) Mhi is binomially-distributed with Mh trials and mean MhP where 


p = Lr,{s, 0, = p,{hi-i(i) G n„|/i-i(i) G n„} 

All together, we have 


P [[A[„+i]] = {mh-,h G Hn+i) I [A[„]] = {nih = mhi + mho] h G 'Hn)] 


(too"i)! 


(7.2) 

(7.3) 

(7.4) 

(7.5) 


Theorem 7.1 (tt Indian buffet process). 

exp[-7X;"=i]P77{A^ii = 1}] 


niXln\l = {mh]hG'Hn)} = 


(mh)- 

X n = ^(^) ^ 1 " = s(^)})' 


henn 


(7.6) 


Proof. Note that Ni^i = 1 a.s. and so Eq. (7.6), for n = 1, is precisely the statement 
that Ml is Poisson distributed with mean 7 , as was to be shown. The result for 
n > 1 then follows by induction on n. In particular, multiply Eqs. (7.3) and (7.6) 
and then apply the following identities: 

mo^i + J2h&H„ '^h = Won -F J^heUn = SftGW„+i (7.7) 

nhew„_,.i ("*»>)! ~ (mhi)’ C^'^) 

A„ := F^{Nin = 1}, (by exchangeability), (7.9) 

and, by exchangeability, 

P^{Ai,„+i = s(h) + z I A^i,s(/i) = s(h) A = s(h)} 

X P^{A^i.s(;i) = s(h) A Ai.„ = sih)} (7.10) 

= F^{Ni^„(^hz) = s(hz) A Ni^n+i = s{hz)} 


for 2 G {0,1}. 


□ 






THE CONTINUUM-OF-URNS SCHEME 


31 


Remark 7.2. Note that 

P4iVi,fc = fc A = fc} = [ p"-'=(l-p)'=-\i(dp), (7.11) 

d[0,l] 

and so Theorem 1.6 follows immediately as a corollary. < 

Given its derivation from an exchangeable sequence, there is, of course, no de¬ 
pendence on the ordering of the underlying sequence in the distribution 

of and indeed this is another way of noting that the combinatorial stochastic 

process is itself exchangeable in the following sense: 

Theorem 7.3 (exchangeability). Let a be a permutation of [n], and, for h G 
consider the composition hoa € Hn given by {hoa){j) = h{a'{j)), for j < n. Then 

{Mh)hGV.„ = {Mhoa)heHn- 

The following proof establishes the exchangeability directly. 


Proof Define m„,fe := : sih)=k and note that : s{hoa)=k '^-h 

because s{h oa) = s{h). The right hand side of Eq. (7.6) can be written 


exp[-7X;”=iP77{iVij = 1}] 


Yli^ANik = k A N.r, = k}y 




(7.12) 


Finally, note that h h o a is a, permutation of TLn, and so = 


7.0.1. Alternative representations of [-^[„]] and their distributions. A convenient 
way to represent is via a binary array/matrix W such that, for every h G 

TLm there are exactly Mh columns of W equal to h (where h is thought of as a 
column vector) and no all-zero columns. The rows thus correspond to the measures 
Xi,... ,Xn, and the columns to the pattern of sharing for some particular atom. 
Note that it is possible that W has zero columns, which corresponds to the case 
when Xi = ■ ■ ■ = Xn = 0. 

In order to determine a distribution over arrays, we must specify the ordering 
of the columns. Griffiths and Ghahramani [GG05] developed the IBP using array 
representations and, doing so, introduced a canonical left-ordered form, which can 
be defined as follows: 

Write -< for the total order on Hn such that g ^ h it and only if, for some 
m G [n] and all i < m, we have g{i) = h{i) and g{m) > h{m). (That is, ^ is the 
lexicographic ordering except that 1 -< 0. As an illustration, (1,1) ^ (1,0) -< (0,1).) 
The array W is in left-order form if adjacent columns are either equal or ordered 
according to -<. 

More precisely, let ^ denote the number of atoms among Xi,, Xn, 

and define 


H{j) := sup {h£Hn : Y.g-<h^g<j}- (7-13) 

(If = 0, = 3, and = 1, then C = 4, H-^{1,0) = {1,2,3} and 

H~^{0, 1) = (4, 5,...} a.s.) We may then express W by 

W=[Hil)-- - HiC)], (7.14) 

where each H{j) is viewed as a column vector. That is IT G {0,1}"'^^ a.s., and 
Wij = 1 implies that Xi contains the atom labeled j, and those rows that also have 
a 1 in column j correspond to the measures that share this atom. 
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Because every feature allocation of [n] corresponds with a unique binary array in 
left-ordered form, the probability of a realization of W is precisely the probability 
of the (unique) realization of that gives rise to the left-ordered form. 

Another ordering that has been studied is the uniform random labeling [BPJ13, 
Pg. 9]. Informally, an array W* is labeled uniformly at random if it is equal to 
W after a permutation of VP’s columns which is uniformly distributed among all 
permutations of [^]. More carefully, let t/i, t/ 2 , • ■ • be an i.i.d. sequence of uniformly- 
distributed random variables independent from Associate with column j of 

W the label Uj and then let W* be the array obtained by sorting the columns of 
W in the a.s. unique increasing order of their labels. 

Note that the number of distinct ways of ordering the C columns of W is 

where the denominator arises from the fact that, for each column equal to h, there 
are Mh indistinguishable copies. This leads immediately to the following result: 


Theorem 7.4. Let w G {0 ,he a binary matrix with k > 0 nonzero columns 
and n rows, and, for every j < k, let Sj := of column j. Then 


F{W* = w} = 7r*(n; Si,..., Sk) 


. 7 

■= M ""P 


-7^P^{iVy = 1}] l[VANx 
i=i 


Z=1 


(7.16) 

>7 A^ln — ; 

(7.17) 


where 7r*(n; • ) is a symmetric function on [n]^ for every n S N and k G Z_|_. 


Proof. The symmetry of 7r*(n; •) is manifest. The result follows from dividing 
Eq. (7.6) by Eq. (7.15) and then the definition of Sj. □ 


In the language of [BPJ13], the functions tt* is an exchangeable feature proba¬ 
bility function or EFPF, which plays the role for exchangeable feature allocations 
analogous to that played by EPPFs for exchangeable partitions. Theorem 7.4 im¬ 
plies that every EPPF tt induces an EFPF tt*, via the distribution of Pi induced 
by TT, which characterizes the combinatorial structure of a homogeneous continuum 
of urns with EPPF tt and a nonatomic hazard measure. 


8. Example: a continuum-of-two-parameter-urns scheme 

Teh and Goriir [TG09] describe a three-parameter generalization of the IBP 
that exhibits power-law behavior by introducing a discount parameter that was 
understood to play a role similar to that of the discount parameter in the two- 
parameter Hoppe urn scheme [Eng78] and its underlying combinatorial stochastic 
process, the two-parameter Ghinese restaurant process (GRP) [Pit96]. The three- 
parameter generalization is shown to correspond to the combinatorial structure 
of an exchangeable sequence of Bernoulli processes directed by a class of random 
measures that Teh and Goriir called stable beta processes. Broderick, Jordan, and 
Pitman [BJP12] study the same process and establish a number of asymptotic 
results characterizing the rate of growth of features, showing that they have power 
laws. 

As we will see, the similarity between the combinatorial structure of the three- 
parameter IBP and the two-parameter GRP reflects a deeper connection: the three- 
parameter IBP can be shown to correspond with the combinatorial structure of a 
continuum-of-urns scheme (A„)„gN induced by the EPPF corresponding to the 
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two-parameter CRP and a nonatomic mean measure. By specializing the results of 
Section 6, we will make this connection precise. 


8.1. Two-parameter Chinese restaurant process. A well-studied EPPF is 
that corresponding with the two-parameter CRP. In particular, consider the func¬ 
tion w on N* given by 

k 

ro(ni,...,nfc) = Hfl (8.1) 

[0 + lj„_i 

where 9 and a satisfy 

0 < a < 1 and 9 > —a (8-2) 


or 


a = —k < 0 and 9 = mk for some m = 2,3,... and fc > 0; (8-3) 


and 


1 for m = 0, 

x{x -I- a) • • ■ (x -I- (to — l)a) for to € N, 


(8.4) 


and [x\m ■= [x\m-,i- The EPPF corresponding to the (one-parameter) CRP, which 
we introduced earlier in Remark 6.2, is obtained by taking a 4,0. 

Let (Z„)„gN be a -nj-scheme with parameters 9 and a and assume Z\ ~ U(0,1). 
(■Z^n)neN is also known as a two-parameter Hoppe urn. The conditional distribution 
of Z„+i given is 


Rn 




1=1 


z,- > 


(8.5) 


where K^-, Njn, and Zj are defined as in Section 4. 


8.2. Directing random hazard measure. Let (A„)„gN be a continuum-of-urns 
scheme with hazard measure Hq and EPPF w with parameters 9 and a. (One 
could also consider allowing 9 and a to vary across the space in a measurable 
way as in Remark 6.2, but we will focus on the homogeneous case.) It follows 
from Theorem 6.7 that there exists a random hazard measure iL directing (X„)„gr^, 
and that the law of H is completely determined by w and Hq. We now proceed to 
characterize H. 


8.2.1. Nonrandom component. We have lPro{X)fci Ti = 1} = 1; and so A = 0. 
Therefore H is a.s. purely atomic. 


8.2.2. Ordinary component. We know that the distribution of the ordinary com¬ 
ponent of H is determined by Hq and ci, i.e., the distribution of the a.s. limiting 
frequency Pi of the first token. Under the two-parameter Chinese restaurant pro¬ 
cess, it is known that the Pi is beta-distributed with concentration 0-1-1 and mean 
and thus ci is absolutely continuous with respect to Lebesgue measure with 
probability density 


p I—>■ 


r(0 + i) 


r(l-a)r(0-fa) 

It follows that the intensity of the ordinary component of H is 

r(0 + i) 


(ds, dp) 1 -^- iLo(ds) 


r(l -a)r( 0 -f a) 


p-“(l -p)®+“-b 

p-i-“(i-p)®+“-Mp. 


( 8 . 6 ) 


(8.7) 


Thus the ordinary component of iL is a so-called stable beta process, as defined by 
Teh and Goriir [TG09]. Note that when a = 0, we recover the ordinary component 
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of a beta process, as expected from Remark 6.2. Teh and Goriir show that a 
stable beta process underlies a three-parameter IBP scheme, which we will derive 
from the continuum-of-urns scheme perspective below. The connection with the 
two-parameter CRP is now manifest. 


8.2.3. The fixed-atomic component. If Hq is nonatomic, then the directing hazard 
measure H is composed of only an ordinary component. The continuum of urns 
perspective, however, also characterizes H when Hq has atoms, which will be seen 
to be useful when we define a hierarchical stable beta process. Recall that, for an 
atom s S 32^0 of Hq such that i^o{s} = <1, we have 

H{s} - ko.o((?, •) = P^{Q, G • }. (8.8) 

Recall that Qq = v\fi,q\. When a = 0, we know that i> is a Dirichlet process and so 

{i>[0,q\, i>{q,l]) (8.9) 

is Dirichlet-distributed with concentration 6 and mean vector {q, I — g), and thus 
H{s} - Beta{9Ho{s},0il-Ho{s})), (8.10) 


which agrees with the definition of the beta process [Hjo90; TJ07]. 

For a general two-parameter CRP, we cannot say as much. When a = — fc < 0 
and 0 > fc is an integer multiple m of fc, it is understood that fc will be supported 
on a finite i.i.d. set {Zi ,..., Z^} with the probability mass symmetrically Dirichlet- 
distributed with concentration k~^. Letting M = we have 

fc[0,g] I M ^ Beta(Mfc, (m — M)fc), (8.11) 


where M is binomially-distributed with m trials and mean mq. Unlike the case 
when a > 0, there is positive probability that ifc{s} G {0,1}, and so the distribu¬ 
tion of ifc{s} is not even absolutely continuous, although it is a mixture of beta 
distributions, and hence absolutely continuous on the event {0 < H{s} < 1}. 

When a > 0 and 9 > —a, the distribution of ifc{s} can be simulated exactly 
to arbitrary accuracy using the stick-breaking characterization of (F’n)neN- In 
particular, there is a collection of independent random variables (lFn)n6N with 
Wj ^ Beta(l — a,9 j a) such that 


P3 = 


d-i 


11 ( 1 -^*) 


2=1 


Wq a.s. 


( 8 . 12 ) 


By the definition of Qq, we know that there exists an i.i.d. process (T'n)nGN of 
Bernoulli random variables with mean q, independent of (Wri)„gN, such that Qq = 
PiTi n.s. For any £ > 0, we can truncate this sum at a finite level and 
compute an approximation Qq such that \Qq — Qq\ < e a.s. By including additional 
terms, which we can compute as needed, this approximation can be tightened on 
demand. (The framework of computable probability theory would allow us to make 
more precise statements about computability. In particular, Qq has a computable 
distribution, uniformly in g G [0,1], in the sense of [Wei99]. See [Royll] for more 
details.) 

Despite the explicit sampling rule for H{s}, there appears to be no simple ex¬ 
pression for its distribution in terms of Hq{s}, 9, and a, although the work of 
James, Lijoi, and Priinster [JLP08] on the distribution of linear functionals of fc 
provides an approach to this problem. For example, using [JLP08, Thm. 2.1], one 
can identify situations where the distribution of ifc{s} is absolutely continuous with 
respect to Lebesgue measure. 
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8.3. Conditional law. Theorem 6.7 characterizes the conditional law of H given 
X[ji] in terms of the kernel k. The distributions k„_fc(p, •), for p > 0, are simulable 
along the lines described above, and these simulations can be used to produce 
MCMC algorithms for more complicated computations. On the other hand, it is 
straightforward to show that 

kn,fc(0, •) = Beta(fc — a, n — k + 6 + a), (8.13) 

which is the conditional distribution of the mass of an atom of H appearing k times 
among the ordinary components of X[„]. 


8.4. Connection with the three-parameter IBP. Assume that Hq is nonatomic, 
and let 7 Ho{n) > 0 denote the total mass of the hazard measure. Because Hq 
is nonatomic, H has only an ordinary component, which we know to be that of a 
stable beta process, and so, we may conclude from the work of Teh and Goriir that 
the combinatorial structure of must be that of the three-parameter IBP. 

Regardless, it is instructive to revisit the probability mass function of the com¬ 
binatorial structure given by Theorem 7.1 in this special case, as we see that it 
depends on the EPPF only through the probabilities Pro{A^ifc = k A Nin = k}, for 
1 < A: < n e N. 

By exchangeability, Pct{A^ 1 ti = 1} = Pro{A"n > Kn-i}, and so both are equal to 
the probability of the event that a new token appears on the (n + l)-st iteration 
of the two-parameter Hoppe urn (equivalently, a new table being allocated in a 
two-parameter Chinese restaurant process). Recall that A„ := F^{Kn+i > Kn}. 
We have 


= E, 


zu{N, 


+ (Kr. + l) 




y Uj{Nn) J 

Conditioning on Kn and then averaging, we have 

A„+i = E^j 


0 Kn ' rr 


'9 + Kn ■ a 9 + {Kn + 1) 


9 


■ n 


1 - 


9- 


9 + n 
Kn - a 


9 


1 

9 


Kn 


■ a 


9 + n+1 . 


(8.14) 


= A„ 


9 + a + n 
6 * -I- 1 -I- n ’ 


(8.15) 


and thus 

_ [9 + a]n _ r(d -I- 1) r(n + 9 -\- a) /'q ic'i 

[0 + i]„ “ r(n + 0 + i)r(0 + a) ■ ^ 

It follows that the number of new features appearing at stage n -I- 1 is 7 A„. Teh 
and Cbriir [TC09] derived the distribution on the number of new features (and in 
particular Eq. (8.16)) from Eq. (8.7) via the calculus of completely random mea¬ 
sures, but the connection with the combinatorics of an underlying two-parameter 
model was not made. 

By exchangeability, the probability Pro{A^ifc = k A Nin = k} is also that of the 
event that a two-parameter Hoppe urn admits a new token on the {n — k + l)-st 
iteration, and then admits k — 1 additional copies of this token in a row. Admitting 
a new token occurs with probability Cn-k, and admitting this new token k — 1 
additional times occurs with probability 


1 — a k — 1 —a T{k — a) T{9 + n — k + 1) 

9 + n — k + l 9 + n — l r(l — a) r(0-|-n) ’ 


and so 


^zx,{Nik = k A Nin = k} 


T{k — a) T{9 + 1) r(n — k + 9 + a) 
r(l - a) r(6»-tn) r(6»-f a) 


(8.17) 


(8.18) 
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which, as expected, leads to a probability mass function that agrees with [TG09, 
Eq. (10)]. The terms appearing in Eq. (8.17) are connected to the continuum-of- 
urns scheme by noting that, informally speaking, conditioned on Xi{s} = 1 , the 
probability that X„_|_i{s} = 1 is 


Nln — ' 
9 + n 




— Q 


(8.19) 


By exchangeability, the right hand side of Eq. (8.19) is also the probability condi¬ 
tioned on Xj{s} = 1 for some j G [n], and so governs the probability of an element 
recurring given that it has already appeared. 

To summarize, we have: 

Theorem 8.1 (combinatorial structure of two-parameter urn scheme). Let (X„)„gN 
be a continuum-of-urns scheme with nonatomic hazard measure Hq and EPPF zu. 
Then |(X„)„gNl is a three-parameter IBP with mass parameter 7 := Ho(Ll), con¬ 
centration parameter 9 and discount parameter a. 

8.5. Hierarchical stable beta processes. It is worth mentioning that Teh and 
Goriir do not explicitly propose a definition for the fixed-atomic component of a 
stable beta process, although they did establish the beta law Eq. (8.13). From 
a conjugacy perspective, it might then seem natural to consider a fixed-atomic 
component governed by a kernel of the form 


q ^ Beta(/(q, 6 », a), g(q, 9, a)), 


( 8 . 20 ) 


for suitable functions f,g > 0. In an article on beta negative binomial processes, 
Broderick, Mackey, Paisley, and Jordan [Bro-fll] make such a proposal, taking 


f{q,9,a):=9^q — a>0 and g{q,9,a) '■= 9 {1 — j q)-\-a > 0 


( 8 . 21 ) 


for some 7 > 0. Except in a trivial case, no such kernel—even one of the general 
form given in Eq. (8.20)— corresponds with a continuum-of-urns scheme with a 
stable beta process ordinary component. 

Theorem 8.2. Consider a continuum-of-urns scheme with a kernel of the form 
given in Eq. (8.20) for some functions f,g > 0. Then the ordinary component is 
given by Eq. (8.7) only if a = 0, i.e., only if H is a beta process. 

Proof. Let h{q,9,a) := g{q,9,a)/f{q,9,a) and let k' denote the kernel given by 
Eq. (8.20). We have lim^^jQ k'{q) = Jq, which implies that lim,j,jQ h{q, 9, a) = 00 . On 
the other hand, we have that the law pq~^k'{q,dp) converges weakly to /3(1 — 
a,9 -\- a) as q fO. This implies that f{q, 9, a) —>■ —a and g{q, 9, a) 9 -\- a — 1. 
Together, these imply that a = 0, which is then seen to be sufficient by noting that 
this is simply a beta process which has the form Eq. (8.20) for f{q,9) = q9 and 
g{q,9) = 9(1 - q). □ 

8.6. Other generalizations. In this section we have connected existing work on 
exchangeable sequences of Bernoulli processes directed by stable beta processes 
with continuum of two-parameter Hoppe urn schemes. Another category of EPPFs 
that would be natural to investigate are those of so-called Gibbs-type [GP05]. 


9. The continuum limit 

The following theorem states that the limiting distribution of the discrete models 
presented in the introduction is indeed that of a continuum-of-urns scheme. 
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Theorem 9.1. Assume tt is continuous on fl. Let Hq, Hq, ... be a sequence of 
purely-atomic hazard measures strongly converging to Hq, i.e., for every A € A, 

H^{A) ^ Ho{A) as m —>■ oo. (9.1) 

For every m, let be a continuum-of-urns scheme with hazard measure 

H™, and let (X„)„gN be a continuum-of-urns scheme with hazard measure Hq. 
Then converges in distribution to (Xji)„gN as m ^ oo. 

Proof. By [Kal02, Thm. 4.29], it suffices to show that converges to in 
distribution as m ^ oo for every n S N. Fix n € N. It is straightforward to show 
that converges in distribution to X[„]{s} as m ^ oo for every s £ s^/q. 

As the measures are completely random, it therefore suffices to prove convergence 
on the complement LI \ £/q, and so we will assume without any loss of generality 
that 32^0 = 0. Moreover, Hq is cr-finite and so we can partition LI into a countable 
partition Hi, LI 2 , ■. ■ such that HQ{Llk) is finite for every fc G N. The restrictions of 
ATjn], X|^j,... to each subset Llk are independent, and so we can, without loss 
of generality, assume also that Hq is finite. 

For every m G N, let be the i.i.d. sequence of Poisson processes underly¬ 
ing (Ar™)„gN. Fix n G N and define i?™ = F™ H-Define (Fr!,)n6N and i?„ 

similarly. From Proposition 6.4, we know there is a probability kernel iz satisfying 
v{Rn) = a.s. and v{Rff) = a.s. for every m G N. The claim 

is that iz is a continuous map from the subspace Af of locally-finite simple Borel 
measures to the space of Borel measures on A/””. Then so is the map taking the 
distribution of a random element i? in Af to the distribution of iz{R). Therefore by 
[Kal02, Thm. 16.24], it suffices to show that R^ converges to in distribution. 

By independence of the Yn, the random measure is a Poisson process with 
intensity uHq, and so, by [Kal02, Thm. 16.17], it suffices to show that 1) P{i?™A = 
0} —>■ P{i?„A = 0} for every A £ A and 2) limsup„ E(i?™Ar) < E(i?„Ar) < 00 for 
all compact sets K £ LI. 

We have 

P{i?™A = 0} = P{i?™{s} = 0} = (9.2) 

S&a^rnnA 

= = g-nHS^(A) ^-nHo(A) ^ = Q}, 

(9.3) 


which establishes (1). For (2), note that 

EiRffK)^ 51 Hff{s} = H^K^HoK^E{Rr,K)<oo, 

S&s/mnK 

(9.4) 

completing the proof. □ 

Remark 9.2. The proof fails if the convergence is merely weak. To see this, take 
H^ := i(i5m-i -b 5 ^- 2 ). Then —>• Jq weakly, but i?™ converges weakly to a 

point process concentrated on {0} whose total mass is binomally distributed with 
mean 1 and variance In contrast, a Bernoulli process with mean 5o is almost 
surely Sq itself. <1 

Having established the relationship between the continuum limit described in 
Section 1 and the continuum-of-urns scheme. Theorems 1.1, 1.2 and 1.4 to 1.6 now 
follow as special cases from their counterparts in Section 6. 
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A. Transfer arguments 

Transfer arguments translate distributional equalities into existence claims for 
random variables on extensions of the underlying probability space. The interested 
reader is advised to consult [Kal02, Chp. 6]. 

Theorem A.l (transfer [Kal02, Prop. 6.10]). For any measurable space S and 
Borel space T, let ^ and rj be random elements in S and T, respectively. Then 
there exists a random element rj in T with = {£,,r]). More precisely, there 

exists a measurable function / : 5” x [0,1] —)• T such that we may take rj = /(^,'d) 
whenever d AL is U{0, 1). 

Corollary A.2 (stochastic equations [Kal02, Prop. 6.11]). Fix two Borel spaces S 
and T, a measurable mapping f : T ^ S , and some random elements in S and rj 
in T with ^ = /(ly). Then there exists a random element fj = r] in T with ^ = f(fj) 

a.s. 

Lemma A.S (conditional independence and randomization [Kal02, Prop. 6.13]). 
Let rj, and C, be random elements in some measurable spaces S, T, and U, re¬ 
spectively, where S is Borel. Then ^ JL,, C *if ■? = /(^j'*^) o,-S- for some measurable 
functions / : T x [0,1] —>■ S' and some U{0, 1) random variable d JL (iy,C)- 
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